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INTRODUCTION 


Wane this issue of the Review is essentially similar in scope and 
organization to the issue of February 1944, certain changes in organi- 
zation and emphasis may be noted: 

1. Additional space has been given to the measurement of personality 
and special abilities. Within the field of personality, extra space has been 
given especially to the topics of attitudes and projective technics. These 
changes reflect changes in research emphases during the last triennium. 

2. There is no chapter devoted to the construction and use of psycho- 
logical tests in the armed services. A separate issue of the Review will 
cover this topic; in the present issue, only occasional or incidental ref- 
erence is made to the findings or experience of the armed forces. 

3. The former three chapters on personality have been converted to 
four separate subject-chapters on Personality Questionnaires, Interests 
and Attitudes, Rorschach Methods and Other Projective Technics, and 
Other Devices for Investigating Personality. The purpose of this change 
is fivefold: (a) to permit greater specialization by the reviewers: (“per- 
sonality” now has such a broad, voluminous literature that division of labor 
is essential); (b) to encourage integration of the material on “construc- 
tion and evaluation” with that on “applications” (formerly, these topics 
were treated in separate chapters) ; (c) to meet reader-interest and reader-' 
expectations: thus, the reader interested in personality questionnaires may 
turn to a single chapter, and find his material there; (d) to provide the 
abstract-journals with chapter titles which are more specific and revealing 
than formerly; and (e) to reduce the overlap of bibliographic entries in 
the chapter formerly devoted to “construction and evaluation” on the 
one hand, and “applications” on the other. 

The reader will miss a chapter on the interrelations and synthesis of 
test results. Research materials in sufficient amount to support such a 
chapter have not appeared. This has for some time been a major gap in 
the research on personality and intelligence testing. 

While the prosecution of the war undoubtedly reduced the volume of 
published reports, there was no dearth of studies for the present issue of 
the Review. As in previous issues, bibliographies have had to be rather 
sharply selective. The Review, in fact, needs more space. More space is 
needed to permit authors to cite all the useful references, and at the same 
time present gracefully written, interesting, critical, and suggestive accounts. 
Thoughts take space; and we cannot limit the latter without sacrificing 
the former. 

Finally, the chairman wishes to acknowledge the help of Dr. Harold 


H. Abelson, who has at many points rendered invaluable assistance in 
the preparation of this issue. 


Hersert S. Conran, Chairman, 
Committee on Psychological Tests. 





CHAPTER I 


Overview and Comments 


HERBERT S. CONRAD 


Or THE BOOKS which appeared during the last triennium, at least five 
deserve special notice; namely, the volume by Strong on Vocational 
Interests of Men and Women (10) ; the monograph by Carter on Vocational 
Interests and Job Orientation (3); the monograph by Munroe on Pre. 
diction of the Adjustment and Academic Performance of College Students 
by a Modification of the Rorschach Method (7) ; the two-volume work by 
Rapaport, Gill, and Schafer on Diagnostic Psychological Testing (9) ; 
and finally, the volume by Crawford and Burnham on Forecasting College 
Achievement (4). It will be noticed that two of these books deal explicitly 
with college students; and a third, the volume by Strong, pays more atten- 
tion to college students than to other groups. While it may be agreed 
that college students are important, we doubt whether they deserve such 
a concentration of research effort. By comparison, the work with high- 
school students is indeed limited. 

An important, broadly inclusive bibliographic reference work is the 
publication by Hildreth (5). 

Outside of the armed services (whose work with psychological tests 
will be covered in detail in a separate issue of the Review), the chief 
increase of research during the triennium has been in the field of per- 
sonality and special abilities. In the latter field, outstanding work was 
done in the development of visual tests for industry, in the development 
and refinement of color-vision tests, and in the verification of group- 
audiometer tests with school children. Outstanding, tho not conclusive, 
is the series of studies conducted under Barr’s direction on the measure- 
ment of teachers’ efficiency (see Chapter III). In the field of personality, 
expansion has been noteworthy in several directions: 

1. There has been active test of construction, especially in the field of projective 
technics and of attitudes. 

2. There has been increased attention to the possibilities of using ability tests as 
measures of personality (both normal and abnormal). 


3. Research, tho it does not appear to have kept pace with applications, has neverthe- 
less been extended and improved. 


In the very active field of the Rorschach Test, the author’s claims for 
the Harrower-Erickson Multiple Choice Group Rorschach Test have not 
been borne out by other investigators (see Chapter VI). On the other 
hand, Munroe contributed an important study demonstrating validity for 
her inspection technic (7). 

Little encouragement can be found in Chapter IV for the use of per- 
sonality questionnaires; evidently the most careful research is needed 
to discover the conditions favorable to validity. Our guess is that, given 
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fully cooperative subjects who are not excessively abnormal, and given 
a well-devised, empirically checked questionnaire designed for the type 
of individual under examination, it should be possible to obtain useful 
information. In this connection, it is interesting to notice that both 
Adams (1) and Burgess and Wallin (2) reported fair-sized validity 
coefficients (.30-.50) for scales designed to predict marital happiness; and 
these scales contain many items from personality questionnaires. 

The armed forces made extensive use of highly abbreviated tests— 
both of intelligence and neurotic tendencies. Special interest attaches to 
the usefulness of brief questionnaires (oral or written) for the “screening” 
of psychoneurotics: the results, as reported, were highly favorable—and 
the question arises why the armed services could get such successful 
results from brief questionnaires, when civilians have trouble demonstrat- 
ing any validity for the full-length scales. The following possibilities are 
suggested for consideration:* 


l. The sample entering the armed forces was extremely heterogeneous, including 
at the lower end unemployables, loafers, “bums,” alcoholics, frank neurotics, etc. It 
would have been relatively easy, by any technic, to eliminate such characters. 

2. There was some special compulsion on the parts of the subjects to tell the truth. 

3. The classification required was very crude: no specific diagnosis was made, merely 
a classification into acceptable versus unacceptable. 

4. There was overlap between the questionnaire items and the criterion; that is to 
say, the psychiatrist’s judgment was probably based, at least in part, on the same types 
of questions as contained in the questionnaire. This leads to a spuriously high coeffi- 
cient of validity for the questionnaire items, since it disregards the discrepancy between 
the psychiatrist's prognosis based on the questions, and the actual outcome. 

5. Sometimes there was contamination of the criterion; i.e., the psychiatrist knew 
the results of the questionnaire, and allowed them to influence his decision. In this 
circumstance, a validity coefficient based on the psychiatrist’s judgment as criterion 
begs the question. 

6. Occasionally the statistical technic used to evaluate the efficiency of the ques- 
tionnaire was faulty. Thus, a biserial r might be based on 100 normal and 100 abnormal 
individuals: when actually the biserial r should have been based on, say, 10,000 normal 
and 100 abnormal individuals (if the ratio of normal to abnormal was 100:1). 

7. There is room for doubt whether the criterion relied upon was adequate. The 
veterans hospitals are extremely full—and not with patients screened and culled suc- 
cessfully by extremely brief personality tests. Very likely the armed services, in their 
use of the short methods, considered it advisable to reduce the number of “false posi- 
tives,” at the cost of increasing the number of “false negatives.” If so, this represents 
an administrative adjustment to the uncertainties of the diagnostic technic; and the 
increase in the number of false negatives (some of whom doubtless ended in veterans 
hospitals) must be charged to the inadequacies of that technic. 


A commendable trend during the triennium has been the increased 
use of multiple measures, or batteries of tests. Both Crawford and Burn- 
ham (4) and Rapaport, Gill, and Schafer (9) exemplify this tendency. 
When many measures are brought to bear on a problem, the opportunity 
arises for differential prediction in the field of abilities, and for more 
definite diagnosis and interpretation in the field of personality. In the 


2 This section was written with Albert Ellis. 
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field of personality, the interpretation of data is typically subjective and 
usually not easily verifiable. In the field of abilities, the combination of 
the various measures into a single score tends to be mathematical or 
routine. Sometimes the objective test-scores of abilities are supplemented 
by impressions from interviews, recommendations, etc. When this is done, 
it is highly important to make sure that the supplementary data actually 
improve validity. The writer is familiar with two instances where the addi- 
tion of interview data has reduced validity substantially, instead of raising 
it. Evidently what happens in such cases is that the supplementary data 
are allowed to have an excessive influence upon the final judgment con- 
cerning the individual’s capacities or upon his final score. In this event, 
the supplementary data are worse than useless, since they have weakened 
validity, instead of strengthening it. 

Probably the most fundamental need in the field of psychological tests 
today is the development of reliable, valid, specific criteria against which 
to measure the efficiency of tests. This need is urgent in the field of 
abilities, and still more urgent in the field of personality. Tests which aim 
to predict or measure a faulty criterion merely perpetuate the errors of 
the criterion. Perhaps one reason for the undesirably high intercorrela- 
tions among tests in a battery is that the tests have typically been validated 
against an unanalyzed, nonspecific criterion (such as average school grade 
or grade-point average). It is most essential that criteria, as well as tests, 
be analyzed into their component parts. We judge that the analysis of 
tests according to correlations with specific criterion-elements should 
prove as rewarding as analysis by the various self-contained systems 
classified under factor analysis. 

The increasing use of factor analysis is evidence of its value for the 
statistical study of interrelations. Recent results from factor analysis have 
tended to reinstate the “general factor” to a position of importance, 
especially for samples drawn from the younger ages, and for samples 
widely heterogeneous in abilities. It is highly unlikely, however, that the 
“general factor” obtained in various studies is identical. 

The original purpose of factor analysis was to lead to the develop- 
ment of new tests which should measure more directly the independent 
abilities identified by factor analysis. Lovell’s (6) work reports an inter- 
esting and partly successful attack on this problem. 

Chapter VIII describes a variety of interesting statistical advances. 
Unfortunately, a number of statistical fallacies marred several researches. 
Perhaps the most egregious is that of Piotrowski et al., who tried out 
many Rorschach “signs” on a small sample (N=86) of mechanical 
workers, and concluded that four “signs” differentiated between the out- 
standing and the nonoutstanding workers; this differentiation had a “dis- 
criminative value of .846” (8, p. 150). The obvious danger of this pro- 
cedure is the capitalization of chance; the obvious requirement is the 
validation of the four “signs” in a fresh sample. The “discriminative value 
of .846” may well be greater than the reliability of either of the four 
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“signs,” or of the criterion of mechanical ability. One other error, some- 
times made by those working with “screen” tests, must be mentioned. 
The important issue, with such tests, is usually the number of “false-posi- 
tives,” not the percent. If, for example, in a population of 10,000 children 
subjected to screening, the percent of “false-positives” is (say) “only 5 
percent,” the number of “false-positives” is 500—a very considerable 
number, for all practical purposes. 

Space is lacking for further discussion of detailed results. Despite 
occasional cause for criticism, the advances of the triennium justify pride. 
What is needed now is a larger army of research workers, well financed 
and well organized, to tackle the numerous problems that still await 


solution. 
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CHAPTER Il 


Construction, Evaluation, and Applications of 


Intelligence Tests 
WARREN G. FINDLEY, WILLIAM W. TURNBULL, and HERBERT S. CONRAD 


Tre LAST TRIENNIUM has seen progress in all departments of intelligence 
testing. The reader can verify the vitality of the period by observing: the 
development of tests which incorporate new features or strike a new 
path; the determination of basic facts or interrelations; the closer quanti- 
fication of knowledge, leading not merely to the addition of decimals, 
but sometimes to new questions or a new orientation (for example, deter- 
mination of the correlation between intelligence and yearly learning- 
gains) ; the clarification or solution of some technical issues in test con- 
struction; some new discoveries or insights (e.g., the demonstration of a 
general-intelligence factor in adults, and the consequent elimination of 
“maturation” as an explanation of this factor at the younger ages) ; 
the use of samples which reveal greater insight into the problems at 
issue; and finally, the more adequate fulfilment of the scientific require- 
ments of investigation. These advances are, of course, related inter se. 
As mentioned in the chairman’s Introduction, the work of the military 


psychologists in the armed services will be covered by others in a separate 
issue of the Review. 


Test Construction 
New Tests 


Group tests—Tiffin and Lawshe (112) prepared two forms of a brief 
Adaptability Test (35 items, 15 minutes). Reliability, either by the split- 
half or alternate forms procedure, was found to approach .90; data on 
validity were also presented. Another brief new test is the Thurstone Test 
of Mental Alertness (111) (98 items, 20 minutes), consisting of arith- 
metical problems, definitions, number-series, and antonyms; separate 
L (Linguistic) and Q (Quantitative) scores are obtained. Norms are pre- 
sented for grades nine thru twelve. 

Two nonlanguage tests were published: one by Pintner (92), the other 
by Penrose (90). The test by Pintner includes six subtests, and requires 
50 minutes of working time; the test by Penrose includes only one type 
of problem (selecting the extraneous pattern of a series of five), and 
requires 30 minutes. 

The Word-Dexterity Test prepared by Peterson (91) is a test of knowl- 
edge of the meaning of prefixes and suffixes; impressively high figures 
were obtained for both reliability and validity. The test developed by 
Johnson (61) was based on Dewey’s well-known analysis of the reflective 
process. Test items were constructed to represent in the elaboration of a 
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single problem, typical good and poor alternative reactions at each stage of 
the thought process. This type of approach to intelligence test construc- 
tion, involving the explicit application of a theory of thought processes, 
has been largely neglected of late, in favor of theories of the independent 
components (or “vectors”) of intelligence. 

Louis and Thelma Thurstone (110) published the Chicago Tests of 
Primary Mental Abilities, which exclude five tests from the 1941 edition, 
and take correspondingly less time (two hours) to administer. 

A civilian edition of the United States Armed Forces Institute Tests of 
General Educational Development (121) has been made available thru 
the Cooperative Test Service of the American Council on Education. 
Separate tests have been issued for the high-school and the college levels, 
respectively. It appears that the tests on /nterpretation of Reading Mate- 
rials in the Social Studies, Interpretation of Reading Materials in the 
Natural Sciences, and Interpretation of Literary Materials could be used 
successfully as tests of intelligence or scholastic aptitude, for individuals 
with normal schcol experience. 

Individual tests—Individual tests continue to be prepared (a) for the 
preschool group, (b) for “problem” children, and (c) for study of the 
abnormal or aged adult. 

At the preschool level, Smith (104) presented a Test of General Infor- 
mation consisting of ninety-two carefully selected items. Shotwell and 
Gilliland (101) described a scale for the measurement of the mentality 
of infants. 

Arthur (4,5) presented a Stencil Design Test which offers opportunity 
for the clinical observation of problem-solving behavior, as well as yielding 
the mental-age level of the subject. 

Specially designed for use with abnormal adults are the Goldstein- 
Scheerer Cube Test (40), the Weigl-Goldstein-Scheerer Color-Form Sorting 
Test (41), and the Goldstein-Scheerer Stick Test (42), all of which may 
be very briefly described as nonverbal reasoning tests. The Wechsler 
Memory Scale (124), yielding a Memory Quotient, was standardized in 
the same manner as the Wechsler-Bellevue Intelligence Scale. Hayman 
(55) recommended, as a sensitive indicator of mental deterioration, a test 
consisting of the serial subtraction of sevens from 100. 


Abbreviated Scales 


One consequence of the wartime need for personnel classification on 
a gigantic scale was the growth of a demand for rapid measurement 
technics. Particular attention has been devoted to the problems of deriving 
a serviceable abbreviated form of the Wechsler-Bellevue Test. Rabin (94) 
was among the first to undertake this task. He selected three subtests 
(comprehension, arithmetic, similarities) from the verbal half of the 
examination. A further abbreviation was effected by Cummings, MacPhee, 
and Wright (27), who dropped the similarities from Rabin’s scale. Gurvitz 
(48) selected the Picture Arrangement and Digit Repeating Subtests as 
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an abbreviated Bellevue Scale. It may be suggested in passing that a study 
using the multiple correlation technic, and based on clearly defined 
samples, is essential if the problem of the most effective combination of 
subtests is to find a conclusive solution. 

Using records for 500 mentally defective patients, Spaulding (106) 
found correlations of .96 to .99 between mental ages obtained from the 
full Stanford-Binet and those obtained by rescoring the papers on the 
abbreviated scale. Spache (105) pointed out, on the basis of his results, 
that abbreviated testing was less accurate among bright children than 
among subnormals. 

The movement toward short tests reached its logical extreme in the 
work of Hildreth (57), who developed single-item tests for the prelimi- 
nary screening of naval recruits. 

The place of sharply abbreviated tests in the main current of progress 
has yet to be established. The evidence indicates that the abbreviated 
version of a reliable test may serve nearly as well as the full test. But 
if precision of measurement should be needlessly sacrificed for adminis- 
trative convenience in situations where even our best instruments are in 
need of refinement—the condition which usually prevails today—then the 
availability of short scales will prove a disservice to psychometrics. 


Technical Considerations in Test Construction 


One of the purposes of factor analysis is to lead to the development 
of new tests which should measure more directly the independent abilities 
identified by the factor studies. That even empirically selected tests are 
likely to have a somewhat complex factor pattern was demonstrated by 
Goodman (43). Lovell (73) undertook to “follow thru” the implications 
of factor studies by devising items which should specifically stress the 
characteristics of one factor (so far as these characteristics could be 
recognized). Her attempt was reasonably successful, if judged by stand- 
ards appropriate to a pioneer effort; further work along this line appears 
justified. 

Davidson and Carroll (29) investigated the contributions of speed and 
level of performance to time-limit scores on a number of relatively simple 
group tests. They found that speed and level scores were related to the 
extent of about .30-.50, and that both contributed to time-limit scores. 

The relation between the number of response options in a multiple- 
choice test and test reliability was studied by Lord (70), who developed 
a formula to predict changes in the reliability of a test resulting from a 
change in the number of choices per item. The same general problem was 
treated by Ferguson (32), who indicated the maximum reliability that 
can be attained when multiple-choice items with 2, 3, 4, or 5 choices 
are employed. 

Mosier and Price (86) made available a scheme for use in arranging 
response options of multiple-choice items, but noted various situations 
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in which complete randomization is not desirable. Practically negligible 
influence of the position of the correct option on the percent of subjects 
who select it was reported by McNamara and Weitzman (82). 

Gulliksen (47) formulated three theorems describing the relation of 
item difficulty and inter-item correlation to test variance and reliability. 
He showed that decreasing the range of difficulty of the items tends to 
raise test reliability. Tucker (119) showed that under certain conditions 
maximum test validity is achieved when average item reliability is less 
than +.3. 

Smith (103) showed that the selection of test items is very similar, 
whether an internal or an external criterion is used in making the selec- 
tion. The correlation between the two criteria in Smith’s study, however, 
was .86; a comparison of the correspondence in item selection under 
the more usual condition of lower total test validity would be desirable. 

The appropriate difficulty criterion for the allocation of test items to 
the proper age levels in age scales was cogently discussed by Jaspen (60). 


Problems in Test Construction 


Foremost among the problems in test construction is that of lowering 
the intercorrelations among the tests of a battery, while raising the 
over-all validity of the battery. Recent years have brought greater emphasis 
on specific or differential prediction; such prediction depends upon 
reasonably independent tests; and the development of such tests depends 
in turn upon reliable, specific, reasonably independent criteria. 

Another question relates to the timing of tests. In what situations, if 
any, are speeded tests preferable to unspeeded? To what extent does the 
answer to this question depend on the type of material involved (e.g., 
mathematics versus verbal aptitude), the sample, the purpose, etc.? 

A basic problem is the relationship between recognition and recall as 
applied to testing. Courtney, Bucknam, and Durrell (23) attacked this 
problem, and found that scores on a multiple-choice intelligence test corre- 
lated more highly with multiple-choice recall (i.e., recognition) than 
with written or oral recall (free recollection of the same material). While 
this study is based on too few cases to be conclusive, it points up an 
important area for investigation. 

Several other technical problems deserve notice. What are the rules 
that should be followed in item-writing? While item-writers undoubtedly 
develop a sense of the requirements of good items of each common type, 
no detailed, explicit codification of rules exists—which could be sub- 
jected to experimental verification and serve as a useful guide to the 
novice. Under what conditions do pretesting and item analysis justify 
the time and expense involved? Despite the paucity of evidence, the gen- 
eral value of pretesting and item analysis is one of the most firmly eld 
dogmas among testers. What, exactly, is the best practical distribution of 
item difficulty from the viewpoint of test validity? What means can be 
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taken to minimize the effects of coaching for competitive examinations? 
Does the test-wise student who has attempted several standardized tests 
have a significant advantage over the “fresh” student who takes only one 
such test? . 

All the questions mentioned above take on added importance, in view 
of the increasingly widespread use of tests in education. 


Evaluation 


A survey of the literature of the last three years indicates quite clearly 
that the evaluation of intelligence tests tends to be limited largely to 
determinations of reliability, to correlations with other intelligence tests, 
and to correlations with school grades. Relatively seldom is a serious effort 
made to ascertain to what extent the test (or each separately scored part 
of the test) measures only one common factor—instead of a composite 
or medley of several factors. Less often is an attempt made to determine 
such matters as susceptibility to coaching, susceptibility to practice-effect, 
ease of scoring, arrangement of items in order of difficulty, validity of 
norms, the effect of the time-limit (or “speed” factor) on reliability and 
validity, etc. Practically never is any effort made to determine the effect, 
upon the subject, of his experience in taking the test ‘(to what extent, 
for example, does the test stimulate or confirm feelings of inferiority and 
aversion to matters intellectual?). Perhaps it is too much to ask for a 
complete evaluation of any one test. In any event, some of the limitations 
of the account below must be charged to paucity of the pertinent research 
literature. 

One excellent basis for the evaluation of an intelligence test is the corre- 
lation between the test and school grades or scholastic achievement tests. 
Studies of this type are covered in a subsequent section, under Applica- 
tions of Intelligence Tests. Data on the constancy of intelligence ratings 
will also be found in that section. 


Correlations with Other Intelligence Tests 


Cursory survey of the literature is sufficient to show that the correlations 
between different intelligence tests are considerably lower than the corre- 
lation between repeated administrations or alternate forms of a good single 
test. (This generalization may not apply fully to the original versus the 
revised Stanford-Binet Test.) Since lack of space prevents a detailed cita- 
tion of the various cerrelations, the interested reader is directed to the 
following references: 26, 28, 39, 53, 64, 71, 81, 98, 115, 116, 118, 122. 


Correlations with “the Ability To Learn” 


Woodrow, reviewing an extensive series of studies, concluded that 
“individuals possess no such thing as a unitary general learning ability,” 
and that “the ability to learn cannot be identified with the ability known 
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as intelligence” (131, p. 148). This report by Woodrow has the wholesome 
effect of stimulating critical inquiry into the determinants of learning- 
gains; on the other hand, it is difficult to believe that long-time learning- 
gains in the general content of the school curriculum are unrelated to 
intelligence. 


Reliability Coefficients ' 


Relatively few reliability coefficients were published during this last 
three-year period. McCarthy (78) reported that the correlation between 
initial scoring and rescoring of the Goodenough Drawing Test of Intelli- 
gence, by the same scorer, was .94, and by different scorers, .90. The 
correlation between scores on two drawings done a week apart by the 
same children, when both drawings were scored by the same person, 
was .68. Tyler (120) found the equivalent-forms reliability of total raw 
score on the Terman-McNemar Tests of Mental Ability to be .94. The 
odd-even (corrected) reliability coefficient for the total verbal score of 
the College Entrance Examination Board’s (20) Scholastic Aptitude Test 
was reported as .96, and for the mathematical score, .95. 


Intercorrelations among Parts 


Crawford and Burnham (26) found the intercorrelations among the 
separate parts of the General Educational Development Tests to be too 
high for the tests to be useful in differential prediction. The College 
Entrance Examination Board (20) reported intercorrelations among the 
verbal subtests of its Scholastic Aptitude Test to be about .75. Lorge (71) 
reported the average intercorrelation among the three parts of the Thorn- 
dike Intelligence Examination for High School Graduates to be over .90. 


Norms 


Rabin (95), reviewing the literature on the Wechsler-Bellevue Test, 
cautioned on the incomparability of 1Q’s from the Wechsler-Bellevue and 
the revised Stanford-Binet Tests (dull subjects tend to make higher IQ’s 
on the Wechsler-Bellevue, and bright subjects, lower). Parkyn (88) 
compared IQ’s from the original and the revised Stanford-Binet; he 
concluded that, for children testing under 80 IQ, the IQ’s on the two 
scales are comparable with regard to implications for institutionalization. 
Wimberly (130) pointed out a systematic error or peculiarity in the 
Kuhlmann-Anderson Norms which makes it especially important that 


the appropriate series of subtests be selected for administration to each 
child. 


Wide-Range Testing on the Revised Stanford-Binet Test 


In a sample of 126 cases, Bradway (8) reported a correlation of :.99 
between IQ’s obtained by standard versus wide-range testing. 
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Validity of the General Educational Development Tests 


Three college studies (7, 26, 31) have been made of the Armed Forces 
Institute General Educational Development Tests. In general, the tests 
were found to succeed as measures of verbal aptitude, but to fail as 
measures of college achievement. 


Organization of Abilities 


Perhaps the most significant study of the organization of abilities during 
this three-year period was carried out by the staff of the Division of Oc- 
cupational Analysis of the War Manpower Commission (123). Several 
experimental batteries were administered to a total of 2156 male adults 
aged seventeen to thirty-nine—either applicants for, or trainees in, Voca- 
tional Education National Defense Training courses. Analysis by Thur- 
stone’s Method revealed group-factors described as Verbal (V), Numerical 
(N), Spatial (S), Perceptual (P and Q), Aiming (A), Finger Dexterity 
(F), Manual Dexterity (M), and Logic or Reasoning (L). Two general fac- 
tors were also found: one, a speed factor (7) (all the tests were speed tests, 
with time limits in the neighborhood of five minutes) ; the other, a factor 
that “appears to have some of the properties of Spearman’s G . . . (and) 
to possess many of the properties that teachers, test examiners, and clinical 
psychologists would attribute to ‘intelligence’” (123, p. 152). As the 
authors remark, the establishment of this latter factor in a sample of 
adults disposes of some theories that such a factor could be found only 
among children, and that it amounts to a common maturational factor. 

Previous studies have led to the view that mental abilities become more 
specific (show lower intercorrelations) with age. Clark’s (18) study 
confirms this view, and Reichard’s (97) study provides qualified sup- 
port. Blumenfeld’s (6) study, presenting the intercorrelations among the 
subtests of the Terman Group Test of Mental Ability for Peruvian chil- 
dren aged twelve thru sixteen, runs counter to the bulk of evidence, by 
finding slightly higher correlations at the older ages. 

It is sometimes hypothesized that high scores in mental tests are more 
likely to reflect special abilities than are average or low scores. On this 
hypothesis (and in the absence of counterbalancing motivational factors) , 
bright children might be expected to exhibit greater variability in educa- 
tional achievement than average or dull children. The results of Gray 
(45) fail to support this hypothesis, at least for young children, since in 
a sample of 600 sixth-graders she found that the dull, rather than the 
bright, showed the greater variability of achievement. 

Halstead (49) has suggested the possibility of a physiological energiz- 
ing or “power” factor (P), by which the usable intelligence of an indi- 
vidual is affected; and he suggests that brain injury, clinical depressions, 
etc., may spare the “primary” mental abilities while impairing the 
energizing factor or the usable intelligence. 
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Applications of Intelligence Tests * 


For the reader’s convenience, the present section follows, so far as pos- 
sible, the same organization as in Freeman’s (33) 1944 summary. 


Intelligence Tests and Educational Achievement 


Elementary school—Allen (1, 2) explored the correlation of group 
intelligence test measures at the middle of grade one and at the beginning 
of grade four, with achievement on standardized tests in grades three and 
four. First-grade intelligence test indices correlated only .40-.50 with such 
achievement; fourth-grade intelligence test measures correlated approxi- 
mately .70-.75 with current achievement results. The group intelligence 
test results correlated less highly with ability in arithmetic computation 
than other aspects of educational achievement involving reading. 

Strang (107) reworked Gans’ data on reading test scores and group 
intelligence test scores of 417 children in the intermediate grades to show 
the substantial variability in reading scores even for pupils whose language 
scores on a group intelligence test are limited to a range of ten months 
of mental age. She also reported closer correlation of language mental 
ages with scores on a standard reading comprehension test than with 
scores on an application-of-reading test (Gans-Lorge Test ‘of Critical 
Reading)—a finding which may be added to other criticisms of group 
tests as measures of intelligence. 

Woodrow (132) found that annual gains of 414 intermediate-grade 
pupils on the six subtests of the Metropolitan Achievement Tests, Partial 
Examinations, showed low intercorrelations (average—.12) and low cor- 
relations with Otis IQ (average—.20), except in gains during grade five. 
He argued from these findings that intelligence is overrated as a factor 
influential in yearly achievement gains. His case would have been clearer 
if he had used average MA rather than average IQ, since MA rather than 
IQ is the measure of immediate mental ability, and if he had also studied 
average gain on the six subtests against this more proper measure of 
intelligence. His study leaves open the possibility that intelligence, as 
measured by MA, promotes average gains while shifts of interest or 
emphasis account for low intercorrelations of subtest gains and consequent 
low correlations with intelligence. 

Gray (45) studied individual variability from test to test in the six 
subjects tested in the Unit Scales of Attainment for 100 boys and 100 
girls of high, of average, and of low intelligence. In all comparisons the 
lowest 15 percent on the Kuhlmann-Anderson Intelligence Test showed 
greater intrapersonal (intertest) variability in achievement than did those 


of average or high intelligence. The sexes were found not to differ reliably 
in individual variability. 


1 Special acknowledgment is made to Mrs. Wille Boysworth, librarian of Huntingdon College, and to 
Emma Louise Wills and Carrie Pursell, librarians in the School of Education Library, University of Ala- 
bema, for facilitating the reading on which this survey is based. 
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High school—Holzinger and Swineford (59) studied the relative effec- 
tiveness of a general intelligence test and test composed of general and 
spatial factors in predicting achievement in high-school subjects. Their 
finding that the spatial factor correlates highly with achievement in 
shop (.46) and mechanical! drawing (.69) confirms much earlier research; 
the relatively lower value of this factor in predicting achievement in 
plane geometry should not have surprised them. The study further justifies 
factorial amalysis and design of intelligence tests. 

College and university—A cheerful note was struck by Durflinger (30) , 
who found that the median correlation between intelligence and average 
college marks had risen from .45 (based on 100 correlations reported thru 
1934) to .52 (based on 47 correlations reported since 1934). As Durflinger 
mentions, this rise may be due to the increased availability of tests 
especially designed for college students, to a general improvement in 
intelligence tests, to improvements in college grading practices, or to 
a tendency by instructors to allow their grades to be influenced by knowl- 
edge of the student’s intelligence test scores—or possibly by all these 
factors in combination. 

Crawford and Burnham (24) reported their experience with the Yale 
Battery and the College Entrance Examination Board Tests: on the basis of 
correlational data, the tests were considered adequate to help in the dif- 
ferential guidance of students into verbal, linguistic (foreign language) , 
technical, and physical-science courses, respectively; Goodman (44), sum- 
marizing the results of several studies at the Pennsylvania State College, 
reported that “the Thurstone Primary Abilities Tests correlate, on the 
whole, as well as most standardized intelligence tests with criteria of 
college success”; it may be pointed out, however, that the other intelligence 
tests require considerably less time for administration. Goodman also 
concluded that “the Thurstone Primary Abilities correlate with individual 
college courses to some degree and can be used for prediction of success 
in these courses.” Crawford and Burnham (25, Chapter VI), on the 
other hand, have questioned the value of the Primary Abilities Tests for 
differential prediction. 

Hartson (50) showed that the prediction of general academic success 
at Oberlin College was more effective among groups with high “effort 
quotients” than among groups with low. Weintraub and Salley (125), 
reporting on withdrawals for poor scholarship from Hunter College, found 
only a moderate difference between those in the upper versus the lower 
half of the distribution of intelligence test scores: 24 percent of those from 
the lower half had been dropped, and 14 percent of those from the upper. 
It should be remarked that the girls at Hunter were highly selected before 
entrance; but the selection was not made by the intelligence test in 
question. MacPhail and Bernard (76) reported on the use of the Brown 
University Psychological Examination for ten years in four hospital train- 
ing schools in Rhode Island, involving 1500 cases. In only two of the four 
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schools was there a reliable difference between the average scores of those 
who graduated and those who did not; correlation between intelligence 
test scores and training school grades ranged from .42 to .60; those 
accepted for training averaged just higher than high-school senior girls 
and much lower than liberal arts freshman girls. 

Traxler (117) found that over a ten-year period, freshmen at teachers 
colleges using the American Council Psychological Examination averaged 
the same as‘ junior college freshmen and the equivalent of only 3.8 IQ 
points lower than freshmen in four-year liberal arts colleges. 


Constancy of Intelligence Ratings 


Allen (3) found a correlation of .69 between Kuhlmann-Anderson IQ’s 
of 327 children who took the test as a group midway thru grade one 
and at the beginning of grade four. Townsend (114) reported correla- 
tions for the same measures as follows: between grades one and four, .65; 
between grades three and six, .70. 

Knezevich (64) reported IQ changes of more than 5 points for 56 of 
113 Wisconsin high-school pupils who took the Henmon-Nelson Intelli- 
gence Test in sophomore and senior years. He concluded that the changes 
were attributable to the unreliability of the test and to error in estimating 
the age for cessation of mental growth. 

Hirt (58) investigated retest IQ’s based on the 1916 Stanford-Binet 
Seale in 1357 cases referred for examination in a large school system. 
Using Terman’s seven categories, she found in this generally inferior 
selection that IQ’s remained static in 62 percent of the cases, declined in 
33 percent, rose in only 5 percent. Hildreth (56) found that average 
retest scores of superior children (Stanford-Binet IQ’s of 130 or higher) 
ran higher than their initial scores and concluded that this made it unwise 
to rely on a single IQ obtained before a child is ten in assigning pupils 
to special classes for the gifted. 

Taken together, these studies constitute confirmation of findings pre- 
viously known. Two trends are underlined. The reports of Allen, Town- 
send, and Knezevich reflect the fact that ordinary variations in IQ values 
under normal conditions and without introduction of special experimental 
factors include substantial numbers of changes from one testing to another 
of ten or more IQ points. The studies of Hirt and Hildreth, in which 
low IQ’s tended to drop and high IQ’s tended to rise may be interpreted, 
as a reflection on the concept of a constant IQ, on the standardization of the 
1916 Stanford-Binet Test, or both. 

For discussion of experimental factors influencing IQ constancy, the 
reader should read also the section on “Environmental Factors.” 


Growth of Intelligence 


Jones and Conrad (62) provided a convenient summary and synthesis 
of findings on the growth of general intelligence; this report is limited 
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to ages eleven to twenty. Conrad, Freeman, and. Jones (21) reviewed 
the literature on differences in the growth of general intelligence among 
bright vs. dull, and among early-maturing vs. late-maturing children; 
characteristics of the growth curves of different mental functions were 
also presented. 


Environmental Factors : 


Sherman’s recent textbook (100) includes a convenient thirty-page 
chapter covering the literature, prior to this triennium, on the ethnic, 
cultural, and educational factors affecting intelligence and its constancy 
of development. The article by Loevinger (69) presents a critique of 
quantitative studies of the proportional contribution of differences in 
nature and nurture to differences in intelligence; Loevinger’s discussion 
of statistical technic is especially recommended. 

Schooling—Wellman (126) summarized research on IQ-changes of pre- 
school and non-preschool children during preschool years, reporting that 
eleven of twenty-two preschool groups studied with the Stanford-Binet 
had average gains of six IQ points or more (N=1537), while only two 
of fourteen non-preschool groups showed similar gains (N—=597). Similar 
results were found for groups tested with the Merrill-Palmer Scale. Iowa 
results differed little from those of other studies with Stanford-Binet Scales. 

Wellman and Pegram (127) reanalyzed, with analysis of variance tech- 
nics, the data originally published in 1938 by Wellman and others on 
the effect of orphanage environment and preschool attendance. They con- 
cluded that thirteen children with attendance in preschool on more than 
50 percent of the calendar days of their more than 400 days of preschool 
life, showed reliably higher IQ gains than the twenty-one in the control 
group of children who did not attend nursery school. A sharp critic 
of the original presentation, McNemar (83), analyzed Wellman and 
Pegram’s reanalysis of the data and, tho still critical, accepted as statisti- 
cally sound their conclusion that preschool environment produced gains 
in IQ. 

Bradway (9, 10) retested after a lapse of ten years 138 children origi- 
nally tested with the 1937 Revision of the Stanford-Binet Scale when they 
were between two and five and one-half years of age. She found IQ 
changes of fifteen points or more in over one-fourth of the cases; approxi- 
mately equal numbers, 24 and 26, had reliably higher and reliably lower 
1Q’s on retest. From data secured thru home interviews she concluded 
that the chief correlate of 1Q-changes was not environment, but intelli- 
gence of parents and grandfathers. Bradway (11) also reported a special 
study, based on the same cases, of the preschool items of the Stanford- 
Binet, which she subdivided into four scales; verbal, nonverbal, memory, 
and number-concept—and correlated with IQ’s obtained ten years later. 
She found correlations of .45 to .62 and concluded that the verbal and 
memory scales were the better predictors of later IQ. In view of current 
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factor theories of intelligence, it would be desirable for further study of 
correlations of preschool part scales with part scales ten years later. 

Lorge (72) reported a study of intelligence test scores of 131 men 
age thirty-four who constituted a representative sample of 863 boys 
tested twenty years earlier at age fourteen. He showed with detailed tables 
the tendency for those who completed more grades in school to have reliably 
higher retest intelligence scores than others who were roughly matched 
with them in IQ when initially tested. Garrett (37) criticized Lorge’s 
article for roughness of equating, small size of sample and “misuse” of 
the term IQ, which appears in Garrett’s argument to be constant by 
definition. In making these criticisms, however, he ignored a basic criti- 
cism of Lorge’s conclusion, recognized by Lorge, namely that “grades 
completed” does not measure simply amount of additional schooling, but 
amount of additional schooling resulting in promotion at a time when 
promotion meant higher educational achievement. Granted that those 
promoted several grades did better on retest, what would have been true 
if those not promoted had been promoted and given more schooling? 
Would they, too, bave gained by more schooling? 

Wesman (128, ) explored “the comparative contributions of several 
of the more popular high-school subjects to mental growth as measured 
by ability to score on an intelligence test.” He found that the gains in 
achievement in separate subjects at the high-school level showed little 
correlation with gains in intelligence test scores for the same period; 
he also found lewer correlations between achievement in particular sub- 
jects and scores on intelligence tests taken after studying the subjects 
than the corresponding correlations between achievement and scores on 
intelligence tests taken before studying the subjects. He explained both 
findings as due to the fact that “higher levels in a subject are more 
specific to it than are lower levels.” In keeping with his environmentalistic 
thesis he concluded that his study “indicates the desirability of direct 
training in mental processes rather than dependence on transfer from school 
subjects.” 

Schmidt (99) analyzed the needs of 254 boys and girls, twelve to 
fourteen years old, all of whom had been originally classified as feeble- 
minded, mean IQ = 51.7. On the basis of this analysis of physical health, 
mental abilities, academic achievement, behavior patterns, family, edu- 
cational and community backgrounds, an experimental educational pro- 
gram was set up for these adolescents that was characterized by group 
planning, group experiences, inschool reproduction of situational expe- 
riences, and use of creative and manipulative arts. After three years, the 
pupils averaged 4.1 years gain in educational achievement. After eight 
years, 27 percent completed four years of high school, the experimental 
group had average adult intelligence, while a control group had dropped 
an average of 3.6 IQ points. This degree of improvement is striking 
and worthy of critical evaluation which has not yet appeared. 
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Factors related to the home—Brown (13) reported that 1000 cases 
he had tested at age six with the 1937 Revision of the Stanford-Binet 
Scale showed the same narrow dispersion reported by Terman and Mer- 
rill in their standardization studies. Brown interpreted these findings as 
reflecting the dominant influence of the home up to age six, exerting 
pressure toward conformity. 

Patterson (89) reanalyzed data previously reported by Wallin on the 
fluctuations of IQ of two siblings tested over twenty-five times in a four- 
teen-year period and found greater parallelism in the curves when results 
were plotted for the same calendar years, when the siblings were four years 
different in age, than when plotted for the same chronological ages of 
the two children. He concluded that environmental influences affecting 
the family as a whole may have produced this concomitant tendency. 

McHugh (80) gave the Goodenough Draw-a-Man Test to eighty-three 
of the ninety-one kindergarten children on whose Stanford-Binet Test 
results he had reported earlier, with a view to exploring the hypothesis 
that pupils who gained in MA and IQ on the Stanford-Binet Test because 
of speech development over the two months between testings, would not 
show similar gains in a nonverbal test of intelligence. Reliable gains in 
Goodenough MA and IQ were found, which correlated only .16 and .17, 
respectively, with Stanford-Binet gains in MA and IQ, but —.36 and 
—.44 with Barr Occupation Ratings for fathers, and —.30 and —.34 
with education of father. Because these negative correlations’ reflect 
greater gains by those with fathers of low occupational and educational 
status, it was suggested that advantages associated with drawing at home 
left children of favored parents less to gain from kindergarten in drawing. 
Darcy (28) in a study of 212 children of preschool age found in all 
sub-groups with respect to age and sex that the bilinguists were inferior 
on the 1937 Stanford-Binet and superior on the Atkins Object-Fitting Test. 
Her study suggests the desirability of exploring the possibility that the 
differences represent outcomes of psychological compensation. 

Livesay (68), studying 1383 high-school seniors in Hawaii, found the 
usual relationship between mental test scores and economic status. 
Economic status seems further to be related to order of arrival of immi- 
grant groups, as in continental United States history. 

Skodak and Skeels (102) rendered an extensive follow-up report of 
139 children of parents of low intellectual, educational, and occupa- 
tional status who were placed in foster homes at an average age of three 
months. They found the mean IQ had moved from 116 at average testing 
age of two years three months, to 112 at four years four months, and now 
113 at seven years one month. 


Ethnic groups—Havighurst and Hilkevitch (52) gave the Arthur Per- 
formance Test to 670 Indian children, age six to eleven, of six tribes. 
The Hopi Indians were superior to the norms based on white children, 
while most of the other groups were approximately at the norms. Con- 
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trary to expectation, the Indian children worked just as rapidly as white 
children on the test. It would appear that performance tests are more 
appropriate to testing intelligence of these Indian children for guidance 
purposes than are verbal intelligence tests. Havighurst and others (51) 
administered the Goodenough Draw-a-Man Test to a representative group 
of over 300 of the Indian children previously tested with the Arthur 
Performance Test. Most of the Indians did better than white children 
on the Draw-a-Man Test. The difference is attributed largely to the fact 
that “the Indian children, and especially the boys, are stimulated to take 
an active interest in the world of nature, and given much opportunity 
to form and express concepts of natural objects, including the human 
body, on the basis of their own observation.” 

No reliable difference in mean score on the 1916 Stanford-Binet Test 
was found by Brown (12) in his well-designed study of 323 second- 
generation Scandinavian and 324 second-generation Jewish children. The 
Jewish surpassed the Scandinavian on certain test items, and vice versa. 
Cultural-experiential factors are invoked to explain these differences. 

Applying the discriminant function technic to results from the Bellevue- 
Wechsler Adult Intelligence Scale, Machover (75) concluded that “the 
subtest pattern of culturally very restricted southern Negroes runs 
counter to expectations based on the assumption that performance tests 
are less culture-bound than abstract verbal tests.” Tomlinson (113) 
studied seventy-five sibling pairs of Negro children in Austin, Texas, each 
of whom had been given both forms of the revised Stanford-Binet in close 
succession. Mean IQ for the younger siblings of the pairs was 92.5, for 
the older 86.7, indicating a reliable difference ascribed to cumulative 
environmental effects. Children from better homes, as rated by the Sims 
Socio-Economic Scale, had higher scores in both groups. 

McGurk (79) reported the usual reliable differences in favor of whites 
over Negroes in southern cities. He went on to propose as a clinically 
sounder basis for evaluating mental deficiency, the development of local 
norms for Negroes in places where they are segregated and under- 
privileged. 

Klugman (63) found a small but not statistically reliable superiority 
of money over praise as incentive among seventy-two. children in grades 
two to seven who took both forms of the 1937 Revised Stanford-Binet Test. 
Between the thirty-eight white children and the thirty-four Negro chil- 
dren there was no difference under money incentive, but when praise 
was the incentive, the white children made equally good scores while the 
Negro children were on the average three IQ points lower. 

Montagu (84) compiled and compared data on the achievement of 
northern Negroes and southern whites on intelligence tests used in World 
War I, and summarized the results in thirteen tables and ten maps as 
a graphic illustration of his thesis that factors other than race, chiefly 
socio-economic, account for many of the differences found, since many 








Review or EpucaTIoNAL RESEARCH Vol. XVII, No. 1 





of the differences favor Negroes. Montagu (85) also presented his data 
in a polemic volume. Garrett (35) presented a bill of exceptions to 
Montagu’s earlier article. Most serious of his criticisms of Montagu’s 
research is the fact that the sampling of draftees assigned for Alpha and 
Beta Tests differed from one examining center to another. 

Garrett (36) also climaxed an extended discussion among biologists, 
anthropologists, psychologists, and semanticists on the question of race 
differences by analyzing some of the data offered by opponents of a con- 
cept of race differences. The complete series of communications, which 
can be traced back thru several months’ issues of Science, will reward 
the careful reader with a summary of the factors affecting “race differ- 
ences” that need to be understood in connection with the facts and 
their interpretations in current living. 


Biological Factors 


Cook (22) summarized the findings of medico-psychological research 
on the effects of the Rh blood-factor in producing feeblemindedness. 
Approximately 11 percent of all marriages involve incompatibility between 
husband and wife with respect to this factor, whence in births after the 
first the probability of feeblemindedness in offspring is substantial. Ref- 
erences to basic research were given. 

Gardner and Newman (34) described the fifth of a series of quad- 
ruplets. This set is unique, being monozygotic. Altho almost identical 
in attributes, as in heredity, differences in their Stanford-Binet mental 
ages, Army Beta mental ages, and Stanford Achievement Test sub-scores 
at ten years three months correspond remarkably to differences in height, 
weight, and other skeletal measures within the set. It was pointed out that 
this is in agreement with findings of the famous Dionne quintuplets. 

Using a sample of 613 students (men and women), Gaskill and Fritz 
(38) determined conclusively that intelligence at the college level is 
unrelated to basal metabolic rate. Pathological cases were not included. 

Guetzkow and Brozek (46) studied the effect of extended vitamin 
B-Complex deprivation on eight normal young men. They found no drop 
in intelligence test scores during 161 days of restricted diet. Reliable 
decline was observed on the mental tests most dependent on speed during 
twenty-three days following, when they were totally deprived of these 
vitamins, but this was righted by a ten-day period of full vitamin supply. 
“As compared with biochemical, physiological, and other psychological 
aspects of fitness, the intellective functions were among those which 
proved to be most resistant to the imposed dietary stress.” 


Exceptional Groups 


Aurally and visually handicapped—Myklebust and Burchard (87) gave 
the Arthur Performance Scale to 121 congenitally and 68 adventitiously 
deaf children of school age at a state institution for the deaf. They found 
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no reliable differences between these two groups. Boys did better than 
girls, the difference being reliable at the 5 percent level, thus confirming 
previous findings. Capwell (14) described the problems and values in 
applying intelligence tests in a school for the deaf, in particular demon- 
strating the useful part of a performance test (Arthur) played in a total 
program of educational and vocational guidance. 

Hayes (54) described and discussed his Interim Hayes-Binet Intelli- 


gence Tests for the Blind, 1942 revision, which eiewe the 1937 Revision 
of the Stanford-Binet. 


Delinquents—Ludden (74) and Kvaraceus (67) found in separate 
studies that low IQ is a factor predisposing to delinquency, but is only 
one of several such factors. Ludden proposed a critical total of three or 
more out of ten predisposing factors, including IQ below 90, as a practi- 
cal index of potential delinquency. 

Porteus (93), extending his earlier studies, confirmed his original 
finding that qualitative scoring of his Maze Test establishes reliable dif- 
ferences in favor of nondelinquents. Differences between behavior problem 
children and others, between satisfactory and unsatisfactory cannery 
workers, were favorable to the socially approved groups. 


Mental disorders—F or studies on the use of ability tests in the differen- 
tial diagnosis of mental disorders, see Chapter VIII. 


Miscellaneous 


Adult groups—Thorndike and Gallup (109) described the standard- 
ization of two twenty-item vocabulary tests in the course of a routine 
opinion poll. Among others, one interesting finding is that voters (for 
either Roosevelt or Willkie in 1940) made reliably higher scores than 
those who neglected their right of franchise. 

Sward (108) administered a battery of eight difficult intellectual tests 
to forty-four professors, age sixty to eighty, and forty-four faculty mem- 
bers, age twenty-five to thirty-five, drawn from the same two institutions. 
In general, the younger men outscored the older. Individual differences 
were more impressive than mean differences between the two age-groups. 
No change of any significance was detected within the ages sixty to 
eighty. The results were considered largely a by-product of disuse or 
an artifact of the particular tests employed. The writer seems to have 
leaned over backward in his summary and interpretations to soften the 
blow for those of us past thirty-five. 

In a suggestive study, Cleveland and Dysinger (19) found, in 20 
senile psychotics, that they could respond to the abstractions in the 
verbal items of the Bellevue Adult Intelligence Scale, but were unable 
to sort objects satisfactorily on an abstract basis. How deep and broad 
is the meaning of verbalized relationships? 

Administrators—Mandel and Adkins (77) administered the American 
Council on Education Psychological Examination (linguistic or verbal 
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section only) to several groups of federal administrators. For twenty 
individuals in the top management group, the correlation between the 
verbal section and the criterion of over-all performance was .64. Favor- 
able results were also obtained with other groups of administrators. 


. Sex difference—Rabin and Weinik (96) gave the Nebraska revision 
of the Army Alpha Examination to ninety student nurses; their scores 
on factor N (number) were notably low. The authors consider this a sex 
difference, rather than an occupational characteristic. 


Interrelations—Cattell (15, 16, 17) conceived and executed an elaborate 
exploratory study of personality traits associated with possession of gen- 
eral intelligence, drawing ability, and mathematical and verbal ability 
at the high achievement levels represented in the Graduate Record Exami- 
nations. Starting from the assumption that group factors found in factor 
analyses of ability tests may simply be symptomatic of environmental and 
intrapersonal interests directing persons of given levels of general intelli- 
gence to specialize in particular abilities, he cast up a preliminary frame- 
work of thirty-five trait characters, whose rated occurrence in 208 male 
adults was factorially analyzed into twelve principal components, as a 
background against which to study relations with intelligence and the 
specific abilities. From this analysis he concluded that intelligence test 
achievement is well identified as a general factor of effective habits of 
thought and work, with relations to emotional stability and integration; 
verbal and mathematical ability at high levels is related to general 
intelligence and its associated personality factors, to character maturity 
and to extensive educational background; verbal ability is also possibly 
related to lack of sociability, resulting in a preference of books over 
people, and to a more sensitive, less masculine personality resulting in 
the type of superiority with words characteristic of the feminine; mathe- 
matical ability is associated with low dominance. This type of research, 
by its nature, verges on the spurious because of the dependence on 
armchair speculation and interpretations, but it offers promise of pro- 
ducing insights into total personality organization not obtained thru 
more controlled research approaches. 


Evaluation by experts—Kornhauser (65) reported in great detail a 
questionnaire study of the opinions of seventy-nine “experts” on values 
and trends in intelligence testing. A substantial majority, fifty-five, 
favored future emphasis on separate factors as distinguished from gen- 
eral intelligence. Almost all (92 percent) agreed that there is a serious 
public misunderstanding of the values and limitations of intelligence tests. 
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CHAPTER III 


Measurement and Prediction of Special Abilities 


HAROLD D. CARTER 


Lixe the other chapters in this issue, the present review cannot pretend 
to be exhaustive; if only because of lack of space, it must be selective. 
Official military and naval staff publications have not been included, since 
these will be covered in a later issue of this Review. The literature in 
some fields, for example reading, is so voluminous that it must be reserved 
for special treatment, altho a few contributions exemplifying the recent 
analytic trend are reported. 


Trends 


A number of trends seem to be revealed in-recent research. One is 
a marked emphasis upon analysis of intellectual abilities into special 
components. There is renewed concern with methods of recording, analyz- 
ing, and interpreting data from batteries of special abilities tests (11, 
138, 142). The war has had the effect of stimulating research upon 
visual and auditory perception, mechanical and other special abilities, 
and various aspects of achievement (38). There is continued interest in 
factor analysis as a method of isolating special abilities (28, 46, 138, 
139). The relationships between special abilities and interests continue 
to receive attention (78). Variations from ordinary pencil and paper 
technics in testing include the use of subjective judgments and increased 
use of apparatus (151). Practices in the army and navy (29) suggest, 
as do other lines of evidence, that strict lines of demarcation between tests 
of general ability, special ability, achievement, and personality cannot 
be maintained. The modern trend is away from emphasis upon verbal 
tests, away from age scales, and toward the use of group tests of the 
point-scale type (83). 

These trends do not seem particularly new; they have had their ante- 
cedents, and they now appear as the natural flowering of a long period 
of development in psychometrics. However, these tendencies may be 
regarded as a noteworthy aspect of research in 1943-46. 


Criteria 


Criteria for the definition of special abilities have usually included 
evidence regarding reliability and validity of measurement, as well as 
independence from “general intelligence.” The use of factor analysis in 
the identification of special abilities amounts in some instances to the 
substitution of other criteria. An example of research by the new method 
is the study by Thurstone (138), who applied factoring methods to the 
study of a battery of perceptual tests and tests of primary mental abilities. 
He found eleven perceptual factors, which turned out to be essentially 
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uncorrelated. Some of the factors seem to hold promise as measures of 
particular academic abilities. 

Burt (20) has indicated the desirability of distinguishing between 
abilities and “mental factors.” His report includes a discussion of basic 
scientific procedures of measurement, which include the taking of sums 
or averages to achieve reliability and rule out trivia, and the taking 
of differences to isolate that which is independent of other measures. 

A study of Reichard (108) calls attention to the importance of the 
subjects’ age and certain technical features of test make-up in the isola- 
tion of special abilities. Reichard’s battery of eight tests included three of 
verbal abilities, two of number abilities, two of memory, and one of 
spatial relationships. The degree of intercorrelation increased from age 
nine to age twelve, and decreased from age twelve to age fifteen. 

Several studies, for example Kirkpatrick’s (75), have called attention 
to the implications of definitions of aptitude and skill, and have stimu- 
lated reconsideration of terminology, especially when viewed along with 
the concept of aptitude tests of the so-called readiness type. Discussions 
like those of Davis (30), Cockett (24), and Blain (14) are significant 
for those primarily interested in incorporating programs of special ability 
testing into the framework of educational and social guidance. The arti- 
cle by Scates (112) described “differences between measurement criteria 
of pure scientists and of classroom teachers.” 


Academie Abilities 


Perhaps group mental tests of the commonly-used verbal types are 
properly regarded as measures of a special ability for academic work. 
Studies such as those by Crawford and Burnham (25) have given exten- 
sive consideration to the prediction of academic success, at the college 
level. In predicting educational success in general, various measures of 
special verbal abilities are still emphasized (47, 101). In a more spe- 
cialized study, Berg, Johnson, and Larsen (12) have shown that tests 
in the mechanics of expression are valid in prediction of grades in rhetoric. 

Studies of the use of previous records in the prediction of college 
success are largely outside the scope of this review. When they include 
more specialized tests, such studies (123) tend to indicate that total pre- 
vious record predicts better than do the special tests. When educational 
achievement is objectively measured, tests predict it better than when 
achievement is estimated in terms of more subjective criteria. Durflinger’s 
review (35) showed that mental tests predict college success better in 
recent years, while achievement tests showed higher correlations with 
college success in earlier investigations. The report by Douglass (33) 
suggested the need for more specialized measures in prediction, inasmuch 
as the abilities required for success in various curriculums are often quite 
different. 

Academic abilities are somewhat specialized; the modern trend is 
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toward analyzing them into their specialized components. A small sub- 
section of the literature on reading, for example, shows a tendency to 
analyze reading abilities into a large number of highly special and 
largely independent abilities. Similarly, in other fields there is evidence 
of interest in detailed and diagnostic measurement. Brody (19) has 
shown that spelling tests in different forms measure independent abilities. 
Simpson (120) has indicated that a group factor might be measured 
by means of a battery of spelling tests. Studies by Stroud (130) and 
by Hall and Robinson (53) have shown that reading skills can be 
analyzed into a considerable number of independent sub-skills. Davis 
(28) applied the method of principal axes in the factor analysis of 
a battery of reading tests. Thurstone (139) reanalyzed the data, finding 
that the centroid method revealed only one factor of importance. To the 
present reviewer, it seems that the finding reported here has more general 
implications. Some of the most popular reading test batteries imply 
by their detailed analysis charts that the tests measure several highly- 
specialized abilities. However, from the processes used in the subtests 
it seems quite unreasonable to suppose that the typical modern reading 
test battery yields reliable measurement of very many factors. 


So far as this review is concerned, the interest here is not in the 
measurement of achievement, but in those aspects of the studies concerned 
with the analysis of academic abilities into components which can be 
measured and predicted by psychological tests. There is obviously here 
no clear-cut division between tests of achievement as such and tests of 
specialized abilities. 


Davis and Henrick (31) have shown that a special test is effective 
in predicting ability in geometry. Goddeyne and Nemzek (45) found 
the Lee Test of Geometric Ability more effective than other measures for 
predicting ability in geometry. Guiler (52) secured a very high validity 
coefficient, .78, for the lowa Algebra Aptitude Test. 

Wittenborn and Larsen (149) showed that a special linguistic ability 
test is more effective than other measures in the prediction of achieve- 
ment in college German. 


Numerous studies (4, 7, 68, 72) imply the existence of a special ability 
which might be called critical thinking ability. The analytic technics 
relating to test reliability, test validity, independence from general intel- 
ligence, and independence from other previously established measures 
should be applied here with special care. Since the existence of special 
abilities for critical thinking is so frequently implied by the naming 
of tests and the development of programs of instruction, it seems desir- 
able to call for a program of research to reveal the nature of the abilities 
more clearly. Perhaps variance in scores on tests of “critical thinking” 
is largely explainable in terms of intelligence, special reading skills, and 
special categories of factual information; very likely factors of attitude 
and motivation are also involved. 
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Scientific Aptitude 


Edgerton and Britt (36) have described procedures used in a compre- 
hensive search for science talent. Hoffman (65) has criticized the method, 
objecting mainly to the use of the successive-hurdles method and to 
the use of criteria which emphasize- mathematical ability and social 
competence. He seems to feel that more emphasis should be given to 
abilities involved in such fields as botany, horticulture, ornithology, 
astronomy, and genetics. The second article by Edgerton and Britt (37) 
was offered in rebuttal. 

Available tests for the measurement of aptitudes for scientific work 
tend to include measures of mathematical abilities, quantitative thinking, 
and special categories of information. No doubt more tests should be 
constructed, in greater variety, and studies should be undertaken to 
determine their validity in the prediction of abilities for different types 
of scientific work. 

Howard’s monograph (67) dealt with the complexity of mental pro- 
cesses to be tapped in science testing. Judges showed very good agreement 
in rating items as requiring memorization versus complex integration of 
information. Student judgment, but not expert judgment, was markedly 
influenced by item difficulty. Factor analysis indicated that the test of 
scientific ability used in the study measures three factors, namely: science 
achievement, an intellectual factor, and a complexity factor. 


Aptitudes for Professional Work 
1. Engineering—Several studies (15, 43, 50, 82) have dealt with the 


use of tests in predicting success in engineering training courses. 
Frandsen and Hadley (43) found tests of mathematics and of electrical 
information more effective than tests of general intelligence for prediction 
of success in a radio training school. Bolanovich (15) found tests of 
personality, of achievement in mathematics, and of general fitness had 
moderately high validity for prediction of success of female engineering 
trainees. Lawshe and Mills (82) found that tests of school achievement, 
of mechanical ability, and of knowledge in special fields were valid and 
efficient for prediction of success in training courses in electricity in the 
Navy. The tests measured ability to read simple measurements and to 
solve simple arithmetical problems, knowledge of practical electrical infor- 
mation, and mental alertness. A multiple correlation of .82 indicated the 
validity of the tests for prediction of achievement in the training courses. 

Griffin and Borow (50) have announced a new test of aptitude for engi- 
neering and physical science. The test includes sections dealing with 
mathematics and arithmetic, mechanical comprehension, and verbal com- 
prehension. Multiple correlations for validity as indicated by course 
achievement were as high as .79. The test is valid for both women and 
men, altho separate norms are necessary. 

Andrews (5) used a battery of tests of manual dexterities and intelli- 
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gence in selecting workers in engineering jobs, finding that the tests are 
valid and efficient, but that interview ratings add to the efficiency of 
selection. Shuman (119) has shown that tests of intelligence, mechanical 
ability, and mechanical comprehension are valid for discriminating be- 
tween effective and unsatisfactory workers in aircraft engine and propeller 
industries. Holliday (66) studied the use of psychological tests in engi- 
neering industries; over a period of four and one-half years, the skills 
of apprenticés approached cumulatively nearer to the levels predicted 
by tests of intelligence and mechanical aptitudes. In the beginning the 
tests of intelligence were more effective in prediction, but later on the 
special ability tests were more effective. This study by Holliday offers 
stimulating suggestions concerning dynamic aspects of prediction. 

2. Music and art—Gilkinson (44) found that musical or auditory abili- 
ties as measured by the Seashore Tests have little relationship to speech 
skills. This study checks on an hypothesis which seems deserving of 
persistent additional examination, in view of present knowledge concerning 
the speech of the deaf. 

New tests of musical abilities, covering time discrimination, melodic 
and harmonic transposition, and melodic and harmonic sequences, have 
been announced by Lundin (88). After statistical analysis, the author 
concluded that the tests measure important aspects of musical temperament 
and ability. Dunlevy (34) found that tests of musical talent are valid 
for prediction of preferences and achievements in various types of musical 
training. 

Barrett (9) applied a battery of tests of interests, mechanical ability, 
and art judgment, to students in a liberal arts: college. She was able to 
show that the tests could be used in combination to measure suitability 
for entrance into an art curriculum. 


3. Law—Adams (1) has reported high validity coefficients for the 
lowa Legal Aptitude Test, which includes tests of verbal ability, informa- 
tion, comprehension, and reasoning. The correlations with first-year law 


achievement ranged from .48 to .76. The highest multiple correlation 
found was .77. 


4. Medical and dental aptitudes—In a discussion of the criteria for 
selection of medical students, Turner (147) has pointed out that aptitude 
tests, plus a good record in premedical work, plus evidence from essay- 
type exercises, furnish the best basis for admission to medical schools. 

Smith (125) has published a long article on testing of aptitudes for 
dentistry. The most effective methods involve use of tests of scholastic 
aptitude, mechanical and manual abilities, and vocational interests. Smith 


reports that the most reliable predictions make use of predental records 
as well as of test results. 


5. Nursing—Potts (106) found that a battery of tests of general and 
special vocabulary, and of mechanical and educational abilities, had 
considerable validity in prediction of aptitude for nursing. No personality 
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type for nurses was indicated. Crider (26) reported that tests of intelli. 
gence and of reading and arithmetic are valid, but that tests of adjustment 
and of interests contribute little in predicting success in a curriculum 
for nurses. These studies, like those in the medical and legal fields, indi. 
cate that the criterion is best predicted mainly by academic abilities, so 
long as the particular curriculum is enforced. 

6. Teaching—A study by Seagoe (115) indicated that tests of scholastic 
aptitude in general, and of special achievements, as well as df personality, 
are useful in the pretraining selection of teachers. Candidates for teacher. 
training seem to be definitely superior in verbal abilities, in quantitative 
thinking, and in knowledge of contemporary affairs, as well as in manual 
and musical abilities. Seagoe’s second study (116) indicated that tests 
of personality such as the Bell Adjustment Inventory, the Bernreuter Per- 
sonality Inventory, etc., are significant in the prediction of teaching suc- 
cess. The Morris Trait Index and the Coxe-Orleans Prognosis Test were 
also valid, but measures of intelligence and of special abilities were not 
useful, Tests of interests and of attitudes were also found ineffective. The 
negative results are interpreted as possibly due to the highly selected 
nature of the group of candidates for teaching. 

Numerous studies have dealt with aspects of pupil achievement as 
criteria of teaching effectiveness. These studies constitute basic research 
upon the criteria which must be involved in the development of any effec- 
tive tests of aptitude for teaching. According to Rostker (110), intelli- 
gence is the major factor in teaching efficiency as measured by pupil 
progress, while personality of teachers is unimportant, and teachers’ social 
attitudes are of intermediate significance. Rolfe’s findings (109) appear 
inconsistent with those of Rostker, while LaDuke’s investigation (80) 
supported Rostker’s to the extent of indicating the importance of in- 
telligence as a factor in teaching effectiveness. 

McCoard (89) found that scores for speech factors are correlated to 
the extent of .45 with teaching effectiveness as measured by pupil gains. 
McCoard’s study is stimulating, suggesting as it does the use of a new 
and specialized and objective technic for prediction of teaching success. 

Several studies have considered various aspects of teacher rating. A 
factor analysis by Smalzried and Remmers (122) indicated that student 
ratings of faculty members measure two factors, namely: an empathy 
factor emphasizing sympathy and fairness, and a professional maturity 
factor emphasizing self-reliance, confidence, and ability in the presenta- 
tion of subjectmatter. Dodge (32) studied personality traits of effective 
and ineffective teachers, using a self-rating technic. He found that the 
more successful teachers rate themselves as more at ease socially, more 
willing to assume responsibility, more sensitive to the opinions of others. 
less willing to hurry in making decisions, and less subject to fears and 
worries than the less successful teachers. Gotham’s study (49) showed that 
various indicators of personality of teachers are valid as judged against 
teacher-rating criteria, but not valid by the criterion of pupil gains. These 
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studies are here regarded as significant contributions to the development 
of tests for teaching abilities. They clarify the criteria against which 
such tests are judged, and they indicate the content of subtests for bat- 
teries of tests to be used in measuring special abilities for teaching. 

A factor analysis of teacher abilities, undertaken by Hellfritzsch (63), 
showed that various tests and other indicators of teachers’ abilities measure 
four factors, namely: general knowledge and mental ability; a teacher 
rating scale factor; a measure of personal, emotional, and social adjust- 
ment; and a favorable attitude toward the teaching profession. Practically 
all the variance in pupil gains could be accounted for by pupil factors 
and teacher factors. One receives the impression that teaching ability is a 
loosely-organized complex of four types of variables. 

A somewhat different type of research is that by Anderson and Brewer 
(3), who developed a reliable observational technic for measuring domina- 
tive and socially integrative behavior in the classroom. The study clearly 
suggests the desirability of further research using a new criterion of 
effective teaching. A somewhat similar type of study was done at the 
nursery school level by Landreth and others (81). It is to be hoped 
that the ideas involved in these studies can be assimilated and used in a 
program of objective testing of special abilities needed in effective teaching. 


Visual Acuity 


In testing visual acuities, the trend has been toward recognition of 
the complexity of common visual work, and toward detailed measurement 
of its varied aspects. Brandt (18) has devised an instrument which records 
on 35 mm. film the location, duration, and sequence of eye fixations, and 
the distance and direction of all movements. Limitations of the common 
test chart for measuring visual acuity have been pointed out by Luckiesh 
(86), who noted the effects of brightness levels and contrast effects 
upon such measurement. Low (85) described a technic of measuring 
peripheral visual acuity. He found it extremely variable, somewhat sub- 
ject to training, and relatively -independent of other visual functions. 
Sherman (118) reported that training in drawing and painting improves 
peripheral acuity as measured, and affects certain other visual abilities, 
but does not improve central acuity. 

Several studies have taken account of visual defects as factors in school 
achievement. Park and Burri (100) indicated that a summed score for 
eye defects was somewhat negatively correlated with reading level among 
pupils in elementary school. A visual test survey of 5000 school children, 
reported by Dalton (27), indicated that about five out of six elementary- 
school children have some visual defects, that many variables. measuted 
are unrelated to school achievement in general or in the special field of 
reading. In summary, one might say that many visual abilities are not 
valid indicators of educational achievement, but they have their importance 
in the field of health in general, and the field of visual well-being in 


particular. 
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Color Vision 


Age and sex differences in color discrimination have been discussed by 
Smith (124), whose findings suggest that differences found in color. 
matching tests may be in part due to experience, and only partly deter. 
mined by native capacities and maturation. Pickford (102) found that 
women who are blood-relatives of color-blind persons have red-green 
weakness much more frequently than other women. Chapanis (23) 
reported that the saturation of the spectrum is reduced for persons with 
certain color deficiencies, but that they can equate brightnesses more 
easily than normal subjects can. Murray (96) has reviewed the develop- 
ment of color-vision tests. 

Recently there has been much criticism of the Ishihara Test. Pickford 
(103) found it inadequate for discrimination of degrees of color blind. 
ness. Hamilton, Briggs, and Butler (54) found that it fails under certain 
circumstances to discriminate between responses of normal and color. 
blind persons. Harris (59) reported that the Ishihara plates are better 
than those of the American Optical Company. Taylor (133) used the 
hues of negative after-images matched against Munsell color chips as a 
criterion, and found the Ishihara Test inadequate. Hardy, Rand, and 
Rittler (56) found some of the Ishihara plates relatively useless, and 
reported that the test is markedly affected by the conditions of illumination 
under which it is used. They consider it a crude screening device, likely 
to give deceptive results. 

In a study of age differences in color discrimination, Smith (124) 
used a matching method, with Munsell materials. The ability to dis- 
criminate increased rapidly up to age twenty-five, and dropped markedly 
after age sixty-four. Females were superior between ages five and eleven, 
and males superior after age fourteen. 

New methods for measuring color-vision differences have been an- 
nounced by Sloan (121) and by Hardy (55). The technics most com- 
monly employed involved matching, judging, and reporting of facts 
concerning after-images. The newer technics not only involved these 


devices, but also present modifications and improvements in method and 
in the use of auxiliary aids. 


Auditory Testing 


The war has stimulated many studies of aircraft-operating personnel, 
among whom an auditory deficiency has been found at about 4096 
cycles per second. This deficiency has commonly been attributed to air- 
craft noise, gunfire, etc. Senturia (117) tested aircraft personnel prior 
to exposure to traumatic stimuli, and found the well-known deficiency 
at about 4096 cycles per second in about 19 percent of the persons 
tested. High-frequency deafness has commonly been reported in war 
studies. Plummer (105) has given special attention to high-frequency 
deafness and the discrimination of speech sounds in high-frequency 
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ranges. It is significant for the validity of hearing tests that he found 
that discrimination of consonants depends extensively upon discrimina- 
tion in the more fundamental frequencies. Hearing aids are ordinarily 
considered helpful only in cases of rather severe losses in hearing in the 
speech ranges of tonal frequencies. 

Numerous studies have dealt with audiometric tests and their use. 
Osborn (99) reported a study in which tests were taken twice on 248 
children, with a year’s interval between tests. Those who had received 
medical treatment showed improvement in 85 percent of cases, as com- 
pared with 23 percent for the group not having medical aid. Hughson 
and Thompson (69) reported that fairly accurate tests can be made of 
children two years of age and older. Templin (135) considered the effect 
of psychological factors on measurement of sound discrimination of 
elementary-school pupils, and also reported evidence that a brief series of 
sound stimuli is valid as judged against a longer series. Fowler (42) 
pointed out the importance of psychological variables such as memory, 
auditory perceptual skill, and interpretative skill, in clinical diagnosis of 
hearing deficiencies. 

Several technical problems have been attacked. Carter (22) has at- 
tempted to work out a method of presenting hearing losses in terms of 
a single index or figure. His study suggests interesting possibilities for 
research on the optimum weighting of scores for hearing in different 
tonal frequency ranges, for the prediction of significant clinical losses in 
hearing. Harris (58) described the apparatus used in group audiometric 
testing, and showed that the reliability of results compared favorably with 
that of the usual individual testing method. Goldman (48) presented a 
comparative study of whisper tests and audiograms, showing that the 
relationship between the two appears not to be simple or constant, and is 
dependent upon parallel hearing losses in both ears. Technical problems 
in testing with commercially available audiometers have been discussed by 
Grossman and Malloy (51). 


Mechanical and Manual Abilities 


Research has continued to provide data which increase the usefulness 
of available tests. For example, Tuckman (146) has provided norms for 
special groups, males and females, for the Minnesota Rate of Manipulation 
Tests, and has studied the relationship of scores with age and intelli- 
gence. Stephens (129) has contributed norms for the Minnesota Paper 
Form Board Test. More norms for the Minnesota Paper Form Board 
Test have been presented by Baldwin and Smith (8), and by Morgan 
(94, 95), who found the test not very useful under certain conditions 
for prediction of ability in a technical-industrial high school. 

Analysis of the abilities measured by the tests is the problem of several 
investigations. Tinker (141) showed that while speed, level, and power 
scores on the Minnesota Paper Form Board Test vary somewhat in- 
dependently among college students, nevertheless power scores are largely 
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accounted for by speed scores, and somewhat by level scores. Rusmore 
(111) found no significant sex differences in scores on the Crawford 
Test of Tridimensional Structural Visualization applied to college students. 
Low correlations between successive trials suggest that the function 
measured by the test changes with practice. Tuckman (145) found little 
overlap between the Minnesota Paper Form Board and the O’Rourke 
Mechanical Aptitude Tests. Traxler (143) found that the Minnesota Paper 
Form Board and the Bennett Mechanical Comprehension Tests correlated 
with group intelligence tests about as much as with one another. Bates, 
Wallace, and Henderson (10) studied four mechanical ability tests, finding 
intercorrelations ranging from —.01 to .52. Men were superior to women 
on the spatial relations and mechanical aptitude tests, but no significant 
sex differences were found on the Minnesota Paper Form Board and the 
O’Connor Wiggly Block Tests. Steel, Balinsky, and Lang (127) found low 
correlations between the O’Rourke Ringing an Electric Bell worksample, 
and the O’Connor Tests of finger and tweezer dexterity, and the Minne. 
sota Rate of Manipulation Tests. Significant sex differences in scores on 
the worksample were found. 

Jones and Seashore (73) have reviewed findings in the development 
of fine motor and mechanical abilities, and have discussed the nature of 
tests of mechanical and motor abilities. Developmental studies show that 
girls are only slightly retarded during adolescence, as compared with 
boys. Most of the currently used tests in these areas measure spatial 
relations, manipulative speed, or efficiency in the assembling of small 
mechanisms. There is little evidence of the existence of broad factors, 
such as manual dexterity, in the field of mechanical abilities. 

Several studies have concerned themselves with prediction of abilities 
in specialized curriculums. Morgan (95) used the MacQuarrie Test, the 
Minnesota Paper Form Board, and a revision of the Army Alpha Test 
in a study of pupils in grade eight*in a technical-industrial high school. 
He found the tests useful in practical vocational guidance when used 
along with other information about the individuals guided. McDaniel 
and Reynolds (90) found that a battery of three tests, namely the 
Bennett, MacQuarrie, and O’Rourke Tests of mechanical abilities, yielded 
a multiple correlation of .47 with instructors’ ratings of ability of high- 
school students in mechanical training courses. 

The prediction of success in mechanical occupations has been an- 
other subject of investigation. An unusual approach is that of Piotrowski 
et al. (104) who found, by an item-analysis technic, that four different 
signs in the group Rorschach Test differentiated between good and poor 
workers, in a small sample of young male mechanical workers. These 
results need confirmation, however, since the investigators did not ex- 
plore the results with a second sample. Jacobsen (71) used a battery 
of manual and mechanical ability tests to predict achievement in_air- 
craft skills, finding multiple correlations ranging from .42 to .61 when 
only two tests were used. McMurry and Johnson (91) presented evi- 
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dence of high validity of the Thurstone Identical Forms and the Bennett 
Mechanical Comprehension and the Minnesota Rate of Manipulation 
Tests, when used along with interviews, in the selection of mechanical 
workers. The Wonderlic Personnel Test and the Army Beta Test were not 
useful, but correlations with on-the-job ratings of employees ranged from 
64 to .71 for the Thurstone and Bennett Tests. 

Teegarden (134) investigated occupational differences in manipula. 
tive performance of applicants at a public employment office. Normative 
materials for the Kent-Shakow Test, and for spatial relations, placing, 
turning, and plier dexterity tests were presented in graphic and tabular 
form, for mea in nine occupational groups and women in seven occupa- 
tional groups. Such groups differed more in problem-solving ability, 
accuracy of movement, and ability to react to complex collections of 
details than in coordination and rate of manipulation. 


Tests of Gross Motor Abilities 


Individual differences in gross motor abilities have been investigated 
by Thompson (136), who tested all the boys-in a junior high school in 
New Mexico. The tests used were the baseball throw for distance, base 
running, chinning, the sixty-yard dash, jump and reach, and shot-put. 
When the groups were equated for age, height, and weight, the Mexican 
boys were superior to the Anglo-American boys in all the tests, and 
significantly superior in five, the exception being the shot-put. 

Numerous studies have been made of the usefulness of particular tests. 
Melton (92) showed that a rotary pursuit test, two coordination tests, 
and’ a discrimination reaction time test were valid in selection of army 
pilots. Bookwalter (16) obtained validity coefficients varying between 
81 and .86 for four methods of measuring motor fitness. Schroeder (114) 
evaluated archery scores as predictive of individual persons’ motor abili- 
ties, investigating the effects of practice and fatigue, and showing that 
the ordinary lesson in archery is too short to provide satisfactory meas- 
ures. Hartman (60) studied the hurdle jump in relation to other motor 
tests on young children, finding the various tests sufficiently reliable, and 
indicating the need for several tests in order to secure adequate meas- 
urement of motor ability of young children. 

Much more research is needed in this area. Numerous normative 
studies are needed, as well as analytic studies in the development of bat- 
teries of tests of general and of special abilities. It would be helpful if 
standardization could provide alternative teams of tests, and indicate 
their comparability. Socio-economic group differences, and the effects of 
training upon tested abilities are topics requiring further investigation. 


Clerical Aptitudes and Abilities 


A few studies have dealt with analysis of the complex of clerical abilities 
and their correlates. Thus Woody (150) investigated the O’Rourke 
Clerical Aptitude Test and a special mathematics examination as used 
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in the senior high school, showing differences in relation to age, sex, and 
size of school. Klugman (78) found no significant relationships between 
scores on the Blackstone Test of Stenographic Proficiency and _per- 
manerice of clerical interests. Permanence of clerical interests seemed also 
independent of scores on tests of intelligence, and typing, and inde- 
pendent of age and school grade. In another study, Klugman (77) found 
significant gains in scores on the Minnesota clerical test and on the 
clerical interest scale of the Strong Vocational Interest Blank, when 
students were tested at the beginning and end of a year of commercial 
schooling. 

Numerous studies have indicated the validity of particular tests. Ober- 
heim (98) found validity coefficients of .66 for men and .54 for women, 
when the NIIP Clerical Test was used to predict proficiency in a library 
course. There were significant sex differences in the validity of individual 
tests in the battery. Lennon and Baxter (84) investigated a clerical em- 
ployees checklist constructed by supervisors, finding it useful in predict- 
ing speed and understanding of the work, but not valid for predicting 
accuracy, nor the personal factors involved in success. Swem (132) 
showed that ability in homework in accounting was not closely related 
to scores on the Minnesota clerical test. Hay and Blakemore (62) applied 
the Minnesota clerical test to experienced and inexperienced applicants for 
clerical work, finding statistically reliable differences in favor of the 
experienced group. Little relationship was found between scores and 
experience beyond one year. The differences in scores of the two groups 
on the clerical test were not explainable in terms of age, intelligence, and 
school training; apparently the differences are due to differences in native 
abilities. In another study, Hay (61) obtained a validity coefficient of 
.70 for the Army Alpha Number Series Test (Nebraska revision), the 
Fryer Name Finding Test, and the Minnesota Numbers Test, when these 
were used to predict success in machine bookkeeping. 


Driving 


Any survey of traffic accidents indicates the educational importance 
and the economic and psychological significance of research on auto- 
mobile drivers. Driver education is of course a major factor in school 
safety education programs. The emphasis upon this topic in recent 
research is therefore 

A major group of studies has been concerned with testing drivers. 
Allgaier (2) vara, piccarb.cingedoorende renee Seibel 

of commercial vehicles. Road tests were considered more important 
quid Ghhas papiehyeeavtlles “adit ana: scdlbdie eleanor Scio 
tests. Among the psychophysical tests, visual acuity was rated most im- 
portant, distance judgment second, and reaction time third. The use of 
profiles in selection was recommended. Kerr (74) considered self-report 
data of little value; he recommended extension of the use of tests such 
as are employed in selection of drivers for public vehicles. Hutter and 
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Dieter (70) demonstrated that ability to pass a night glare test could be 
markedly improved by administration of vitamin A, and that the ability 
varies in the same individual from time to time. Truog (144) described 
a test for fire motor drivers, which includes five tasks, namely: driving 
in a straight line, steering within close limits, stopping smoothly when 
going twenty miles per hour, stopping precisely at a painted cross on the 
street, and parking the car against the curb in regulation parallel parking. 
After reviewing tests for drunkenness, Forbes (41) concluded that in- 
dividual differences are so great that it is not desirable to attempt to 
set a fixed level of alcohol in the blood or urine as indicative of un- 
fitness for driving. 

Another large group of studies has dealt with the causes of accidents. 
Farmer (39) noted that drivers of commercial vehicles are most often 
involved in accidents, and he recommended higher standards for truck 
drivers. He noted that while the act of driving does not require high 
intelligence, the avoidance of accidents does require mental alertness. 
Smith (126) concluded that the dependence of accidents upon drunken- 
ness is greatly exaggerated; he pointed out the advantages and limita- 
tions of blood tests. Schrenk (113) analyzed causes of accidents, finding 
the causation complex, with human factors predominant. If his analysis 
is correct, the most effective tests will emphasize perceptual factors and 
mental attitudes of drivers. Rawson (107) studied accident proneness, 
and reported that the only methods of prevention at present available 
are based upon selection or licensing tests, the study of past records, and 
elimination of unfit drivers. He reported that accident-prone persons 
tend to be impulsive rather than thoughtful, and that they tend to reject 
authority and personal responsibilities. 


Vocational Selection 


In various sections of this Review, specialized tests have been dis- 
cussed, and their uses in vocational selection reported. In this section are 
included only those studies not reported elsewhere. 

Carlson and Rich (21) have reported high reliability and validity for 
a visual adaptation of Thurstone’s Auditory Code-Aptitude Test; the visual 
test was used in a naval training school for signalmen. Harmon and Di 
Michael (57) have presented evidence of the reliability and validity of a 
new test for telegraph operators which presupposes auditory discrimina- 
tion and attempts to measure associative memory and concentration. 

Relatively few studies (in view of the importance of the problem) have 
been made of the use of tests of sales abilities. Kirkpatrick (75) pointed 
out some of the difficulties in the use of tests in this field, noting that the 
most useful tools have been standardized personal history blanks, interest 
tests, personality tests, and interviews. Hilgert (64) reported that only 15 
percent of companies use tests, and gave the reasons why the other 85 
percent did not use them; he also listed the most used tests. Flemming 
and Flemming (40) used six tests in studying applicants for selling jobs. 
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The tests were the Bernreuter Personality Inventory, the Moss Social 
Intelligence Test, the Washburne S-A Inventory, the Otis Self-Administer- 
ing Test of Mental Ability, the Canfield Tests of Sales Knowledge, and the 
Strong Vocational Interest Blank. The analysis was qualitative, not statisti- 
cal; the pattern of test scores was considered in relation to the pattern 
needed for the particular job. The patterns needed varied for different jobs 
and for the same jobs in different companies. The qualitative evaluations 
of applicants were reported as highly valid in the selection of salesmen. 
Executives for five companies involving 218 salesmen estimated that 80 
to 90 percent of the analyses were accurate in their descriptions and 
evaluations of the men employed. 

A vonsiderable number of studies have dealt with the use of visual 
tests in industry. Weston (148). reported that fitting workers with suitable 
glasses improved production in fine work, and resulted in marked im- 
provement in feelings of satisfaction and comfort. Stump (131) presented 
evidence that use of visual screening tests would markedly reduce the 
accident rate in an industry, finding significant differences in visual 
acuities of workers in relation to accident records. Lueck’s review (87) 
indicated that careful study of individual differences in vision would be 
valuable in fitting persons into industrial work more efficiently. Minton 
(93) discussed the visual requirements of industrial jobs, dividing the 
jobs into four groups depending upon their visual requirements. Tiffin 
(140) indicated how profiles may be used in picking out the better oper- 
ators in several occupations. Since visual skill patterns are associated with 
success on certain jobs, Tiffin recommends the validation of particular 
visual tests for particular groups of industrial jobs. 
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CHAPTER IV 


Personality Questionnaires 
ALBERT ELLIS 


Severat noteworthy critical reviews of personality questionnaires have 
appeared during the last three years. Traxler (83), in his latest survey 
of the field, concluded that the use of personality questionnaires in guid- 
ance programs is still questionable. Maller (57) came to much the same 
conclusion, pointing out, however, that personality questionnaires are 
rarely given under the conditions prevailing during the standardization 
process. Meehl (62) suggested that the main fault with presentday per- 
sonality questionnaires lies not in their being “structured,” but in the 
casual @ priori item-construction that often goes into them. Ellis (26) 
surveyed over 200 validity experiments and concluded that personality 
questionnaires are of dubious value in distinguishing between groups of 
adjusted and maladjusted individuals, and of much less value in individual 
diagnosis. 

Other surveys were published by Durflinger (24), who reviewed the 
personality tests generally used in college prediction; by Hunt, Wittson, 
and Harris (42), who discussed the use of “screen” tests in military selec- 
tion; and by Malamud (56), who discussed psychological testing in 
psychopathological research. 


New and Revised Instruments 


New and revised personality questionnaires have continued to appear. 
The Cornell Service Index (18), originally devised for military work, was 
put on the market for more general application. Dodge (21) brought out 
Form S-C-T of his Occupational Personality Inventory, designed especially 
for work with clerical workers, salespeople, and teachers. Factors G-A-M- 
I-N of the Guilford-Martin Inventory (60) were isolated and published. 
Johnson (43) came out with the Johnson Temperament Analysis, a 182- 
question scale purporting to measure several personality traits. MacNitt 
(55) published the ninth edition of his Personality and Vocational Guid- 
ance Test. McKinley and Hathaway (53, 54) continued intensive work 
with the Minnesota Multiphasic Personality Inventory, and reported <<ales 
for depressives, hysteria, hypomania, and psychopathic deviates. Schram- 
mel and Garbutt (78) published a Personality Adjustment Scale. Shipley 
and Graham (79) brought out the Personal Inventory, utilizing the forced 
choice technic. 

A great many personality questionnaires, not yet released for general 
distribution, were reported in research articles. Bennett (5) reported a 
high distribution, were reported in research articles. Bennett (5) reported 
a high measure of validity for Slater’s neurotic inventory. Cason ($5) 
claimed reasonably high reliabilities for a 317-item questionn od ae for 
prisoners. Drake (23) devised a special Thinking and Emotional Intro- 
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version-Extroversion Scale for the Multiphasic Test. Geddes (32) pre- 
sented a fifty-item questionnaire for seventh and eighth grade boys, but 
gave no norms. Gray and Wheelwright (34) developed a seventy-five-item 
questionnaire based on Jung’s psychological types. Martin (58) published 
a paper on his Worry Inventory devised for use with university students. 
Maslow and his associates (61) reported satisfactory reliability and 
validity for a clinically derived questionnaire to measure the psychological 
security-insecurity of college students. Runner and Seaver (76) published 
a report on their Personality Analysis Test. Watson (85) reported on the 
validity of the Watson-Fisher Inventory of Affective Tolerance. Werner 
and Carrison (86) claimed to be able to distinguish brain-injured from 
normal children by a questionnaire on animistic thinking. Burgess and 
Wallin (12, 13) devised a moderately reliable engagement adjustment 
scale, for predicting happiness in marriage. Jurgensen (44) reported 
that his Classification Inventory, constructed for use in industrial employ- 
ment situations, showed satisfactory -validity and reliability. Several 
writers (4, 63, 68) published reports on questionnaires designed to dis- 
tinguish neurotics from adequately adjusted men in the army or navy. 


Reliability and Validity Evaluations 


The one notable study of questionnaire reliability was that made by 
Cuber and Gerberich (19). They took sixty widely used questions from the 
Bell Inventory, the Thurstone Attitude Scales, and other questionnaires; 
submitted these at three different times to 132 sociology students; and 
found that 72 percent of the responses were consistent. Factual questions, 
oddly enough, showed a lower consistency than did attitudinal and evalua- 
tional questions. 

Validity, rather than reliability, is still the hub of the entire matter 
of testing personality by the questionnaire method. Kornhauser (46) 
asked seventy-nine noted psychologists how satisfactory or helpful for 
present practical use they considered personality inventories of the Bern- 
reuter, Bell, and Humm-Wadsworth type. Only 1.5 percent of these psychol- 
ogists replied that they considered them highly satisfactory; 13.5 percent 
thought them moderately satisfactory; the rest deemed the questionnaires 
doubtfully satisfactory, rather unsatisfactory, or highly unsatisfactory. 

A great many validity studies dealing with the Multiphasic Inventory 
have appeared during the last three years. About half of these studies 
gave evidence of positive validity; the other half indicated either 
lack of validity, or only weak validity. Validity studies of other per- 
sonality questionnaires also turned up a wide array of results; however, 
applications of questionnaires for “screening” purposes in the army and 
navy seem to have been rather uniformly successful. For details concerning 
all these studies, the reader is referred to the review by Ellis (26). 

In a recent report, Congdon (17) found a tendency for college students 
who made lower grades on the Mooney Problem Checklist actually to 
have more problems. Zapf (88) found that children’s responses to a 
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questionnaire on superstitions correctly forecast their actual behavior in 
75 percent of the instances. Adams (3) used the Adams-Lepley Personal 
Audit, the Guilford-Martin Personnel Inventory, and the Terman Predic- 
tion Seale in a study of the prediction of adjustment in marriage. He 
found low correlations for the first two of these questionnaires and slightly 
higher ones for the third scale. Burgess and Wallin (12) reported that their 
engagement adjustment scale correlated .43 and .51 for men and women, 
respectively, after three years of marriage. 

A factor affecting the validity of personality questionnaires is the degree 
of honesty of responses by the subjects. Fischer (29) found that the mean 
number of serious problems checked by 102 college students on the 
Mooney Problem Checklist was significantly greater when signatures were 
withheld, than when signatures were required. Other studies concerning 
honesty of response to personality questionnaires have been reviewed by 
Ellis (26). In general, it appears that personality questionnaires can be 
“faked,” and that complete truthfulness is not to be expected when lack of 
truthfulness would better suit the convenience or purposes of the subjects. 


Construction and Scoring Technics 


In the field of scoring technics, Burton and Bright (14) published a . 
method of scoring the Multiphasic Personality Inventory, involving the use 
of punched cards. This method is claimed to reduce scoring-time to four 
minutes per test. Kempfer (45) and McClelland (52) proposed simplified 


methods for scoring the Bernreuter, which save a great deal of scoring- 
time with only a small loss of test reliability. Schmidt and Billingslea (77) 
offered a technic for constructing profiles from regular Bernreuter scores. 
These profiles, they claimed, differentiated normal from deviant in- 
dividuals with approximately 80 percent accuracy. 


Factor Analysis 


The use of factor analysis continued as an important tool. Lovell (51) 
submitted the Guilford-Martin Inventory of Factors STDCR, GAMIN, and 
the Personnel Inventory to 200 college students and discovered six super- 
factors. Cattell (16) continued his exhaustive work on trait clusters, and 
organized 131 phenomenal clusters previously obtained into fifty nuclear 
ones. Brogden and Thomas (10) worked with twenty-five of the items most 
heavily loaded in the Bernreuter Sociability Scale and found five primary 
factors among them, which they named intellectual independence, gregar- 
iousness, slowness of reaction, need for primary human relationship, and 
intellectual leadership. 


Applications in Educational Appraisal and Guidance 


Applications of personality questionnaires in educational areas have 
been many and diverse. Blair (7) studied the personality adjustment of. 
ninth-grade pupils with the California Test of Personality as well as the 


a 











Review oF EpucaTIoNAL RESEARCH Vol. XVII, No. 1] 





Multiple Choice Rorschach Test, and found no high relationships between 
the two tests. Engle (27) used the Bell Adjustment Inventory to see jf 
overage school children differed from normal ones, but found no out. 
standing differences on any of the Bell scales. Ogan (67) investigated the 
wartime problems of college students with a problem checklist and dis. 
covered a high incidence of frustration, hysteria, cynicism, despair, and 
misapprehension. Woolf (87) studied the relationship between home 
adjustment and the responses of junior-college students to the Bell 
Inventory, reporting that a poor home adjustment is accompanied by 
unsatisfactory behavior on the part of the students. Mooney (65) utilized 
his own Problem Checklist on freshman girls and discovered that they 
had an average of thirty problems, with 60 percent of them desiring an 
individual conference to discuss their problems. Houston and Marzolf (41) 
found that the Mooney Problem Checklist could be very helpful when 
given to students and then discussed by the faculty members. Pugh (70), 
employing the Symonds Adjustment Questionnaire on Negro students in 
mixed and in separate high schools, reported little difference between these 
groups in their total adjustment scores. 

Several investigators employed personality questionnaires in an effort to 
discover significant relationships between scholastic achievement and 
personality self-ratings. Griffiths (36), using the Bell Adjustment Inven- 
tory, found no better adjustment for men with brilliant scholastic records 
in college than for men of lowest academic achievement. Spinelle and 
Nemzek (81), employing the Link Inventory of Interests and Activities, 
found that the correlation yielded by the inventory and measures of school 
success was low, and did not possess direct value for educational or voca- 
tional guidance. Thompson (82) gave the California Test of Personality to 
dental school students and found very low correlations between their test 
scores and their theory and technic and practicum scholastic criterion 
scores. Typical of several other studies, Bennett and Gordon (6) reported 
that the Bernreuter Inventory, when used with nursing school students, 
was of little or no predictive value. 

Paper and pencil tests of personality have also been rather widely 
used in the last three years, in attempts to measure teaching success. 
Dodge (20) gave his Occupational Personality Inventory (21) to 301 
teachers and found that those rated by their supervisors as more suc- 
cessful reported themselves on the test to be (a) more at ease in social 
contacts; (6) more willing to take initiative and to assume responsibility ; 
{c) less subject to fears and worries; (d) more sensitive to the opinions 
of others: and (e) slower to make decisions. Gotham (33) used the 
Bernreuter, Washburne, and Rudisell Inventories with elementary-school 
teachers and tried to determine the relationship between scores and 
degree of change effected in the pupils by the teachers under observation. 
No significant relationship was observed. Lough (50) gave the Multiphasic 
Personality Inventory to 185 unmarried women students in a teachers 
college and reported that they were a relatively stable, normal group 
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with a very slight tendency toward hypomania. Valentine (84) devised a 
schedule of fifty questions for professors to ask themselves in order to 
check on their own teaching proficiency, but reported no norms for the 
test. Retan (72) used the Pressey and Bernreuter Tests to investigate the 
relationship between emotional instability and teaching success. She con- 
cluded, interestingly enough, that records of the 152 individuals she 
studied “indicate that emotional instability is not conclusive evidence of 
unfitness for teaching” (72, p. 141). Bollinger (8) employed the Wash- 
burne Social Adjustment Inventory, the Bell Adjustment Inventory, and 
the Symonds’ What Kind of a Year Are You Having Tests in his study 
of the social impact of the teacher on the pupil. He found some significant 
differences among the groups of teachers in three different schools. 


Clinical Diagnosis and Treatment 


Many recent applications of personality questionnaires have been in the 
area of clinical diagnosis and treatment. Brozek, Guetzkow, and Keys (11) 
utilized the Multiphasic Test in a study of the personality changes occur- 
ring in normal young men maintained on restricted intakes of vitamins 
of the B-Complex. Grinker and his coworkers (37) used a 121-item 
questionnaire to investigate predisposition to operational fatigue. Pratt 
(69) administered a questionnaire to 267 boys and 303 girls in a study 
of the fears of rural children. He found that girls have more fears than 
do boys, and that there was some evidence that the number of things 
feared increased with age. Rashkis and Shaskan (71) employed the Mul- 
tiphasic Inventory to evaluate the results of group psychotherapy. Richard- 
son (75), using the Guilford-Martin Inventory of Factors STDCR, found 
the stutterers to be significantly different from non-stutterers in social in- 
troversion, depression, and happy-go-lucky tendencies. 


Surveys of Specific Groups 


Personality questionnaires have been frequently used in studies of racial, 
religious, sex, or other groups. For example, Engle and Engle (28) em- 
ployed them with Amish and non-Amish school children; Kuhlen (47) 
compared the Pressey scores of Japanese, Chinese, and white pupils in 
a Hawaiian high school; and Long (49) utilized the Bell Adjustment In- 
ventory in a study of Jewish and non-Jewish subjects. This method of 
employing personality inventories is not usually a propitious one for 
several reasons: (a) the investigator (or his readers) tend to assume a 
validity for the instrument which has seldom been established; (b) false 
emphasis is often placed on intergroup rather than intragroup differences; 
(c) dangerous, anti-democratic “facts” are sometimes sought and found; 
(d) the results are more often than not meaningless or unimportant. 


Occupational Guidance and Selection 


As usual, several published reports have dealt with the use of personal- - 
ity questionnaires in occupational guidance and selection. Abramson (2) 
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used the Minnesota Multiphasic Personality Test for the selection of 
specialized military personnel and found it fairly helpful. Dorcus (22) 
studied the Humm-Wadsworth and the Guilford-Martin Personnel Inven. 
tories in an industrial situation and reported that caution should be 
exercised in their use. Forlano and Kirkpatrick (31) used the Bell and 
Washburne Tests in the selection of radio-tube mounters and found a high 
degree of relationship between test scores and supervisor's ratings of 
employees. Harmon and Wiener (40) used the Multiphasic Inventory as 
part of a test battery for the vocational diagnosis of disabled veterans 
applying for rehabilitation, and found it an instrument of prime utility. 
Martin (59), working with the Guilford-Martin Personnel Inventory 
on aircraft and textile employees, claimed that it was able to disclose from 
82 to 85 percent of the workers who, in management’s opinion, later proved 
to be malcontents. Mittelmann and his associates (64), administered the 
Cornell Selectee Index and the Cornell Word Form Test to industrial per- 
sonnel and found that they both differentiated significantly between in- 
dividuals with moderately severe or severe personality disturbances and 
those without such disturbances as revealed by a psychiatric interview. 


Studies of the Nature and Dynamics of Personality 


As might be expected, a good many recent studies of the nature and 
dynamics of personality have had recourse to personality questionnaires. 
Three studies involving parental variables yielded positive results: Lewis 
(48) found that children whose parents are rated by teachers as having 
“superior” attitudes toward the child and the home do, in general, obtain 
more desirable personality test scores than children whose parents received 
“inferior” ratings. Dyer (25), employing the Bell Inventory on 100 “only” 
and 100 “non-only” children, reported that, in regard to total test scores, 
the “only” children seemed to be about as well adjusted as the “non-only.” 
In regard to the “home” and the “emotional” areas of the Bell Test, the 
“only” children made somewhat better scores than the “non-only.” Smith 
(80), applying the Terman-Miles Masculinity-Femininity Test to sorority 
girls and their parents, found tendencies for the more decidedly “feminine” 
girl to have a more “feminine” mother and a more “masculine” father. 

Morgan (66) applied the Loofbourow-Keys Personal Index to a group 
of visually handicapped twelve-year-olds. This group exhibited a higher 
degree of personality and social maladjustment on the index than did 
normal children. In a series of four papers, Hanawalt and Richardson 
(38, 39, 73, 74) found that some of the Bernreuter scales distinguished 
significantly between various kinds of leaders and nonleaders, while others 
of the scales did not. 

Lack of relation between personality questionnaire scores and other data 
was reported in four studies: Fiske (30) found no direct relationship be- 
tween somatotype groupings and scores on the Bernreuter Personality 
Inventory. Boynton and Wang (9), using the Boynton Personality Inven- 
tory, found little relationship between children’s play interests and their 
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emotional stability scores. Gray (35), utilizing a sample of 600 sixth-grade 
pupils, found no statistical difference between emotionality scores on the 
Boynton Personality Inventory and a measure of their variability on 
achievements tests. Abramson (1) employing the Multiphasic Test, 
found that, in general, a subject expressed the same attitudes when mildly 
under the influence of alcohol as when sober. 


Summary 


An examination of research studies in the field of personality question- 
naires during the last three years leads to the following conclusions: 


1. Paper and pencil tests of personality are still being very widely used 
by educators, psychologists, and sociologists for both research and clinical 
purposes. 

2. Interest has shifted largely from the older personality inventories to 
the newer ones like the Guilford-Martin, Humm-Wadsworth, Cornell, and 
—especially—the Minnesota Multiphasic questionnaires. 

3. While experimenters continue to report satisfactory reliabilities for 
most of the tests employed, validity studies bring forth many unsatisfactory 
and highly questionable results. Authors of tests tend to find their instru- 
ments quite “valid,” but other observers frequently do not corroborate 
these findings. 

4. The validity of personality questionnaires seems to be much higher 
for some uses than for others. For purposes of distinguishing between 
good or bad students or teachers, the tests are woefully inadequate. In 
clinical diagnosis, their record is somewhat better. In occupational situ- 
ations, and in military screening, it seems, on the basis of the most 
recent reports, that the inventories give fairly satisfactory results. 

5. There is a continued pernicious tendency on the part of many ex- 
perimenters to employ personality questionnaires whose validity is still 
very much in doubt and, on the basis of scores on these tests, naively divide 
their subjects into “neurotic” and “normal,” or “introverted” and “ex- 
troverted,” or some similar dichotomous groupings. 

6. There can be no doubt whatever that a great deal remains to be 
done in the construction, evaluation, and application of personality inven- 
tories. Further research designed to increase test validity is still the crying 
need in this area. 
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CHAPTER V 


Interests and Attitudes 


ALBERT ELLIS and J. RAYMOND GERBERICH 


I vrerests and attitudes pervade a large proportion of all behavior, and 
may correspondingly be inferred, by various technics of observation 
and measurement, from a wide variety of human responses and activity. 
The present chapter, however, is confined to the verbal-response type of 
measurement which characterizes most “tests.” This is a limitation which 
must be borne in mind, since verbal-response tests in this sector—as, 
indeed, thruout the whole field of personality—cannot be relied upon for 
the whole story. Some nonverbal, or less highly verbal, technics of per. 
sonality measurement are considered in the next two chapters of this issue. 


Interests 
Surveys and Reviews 


Outstanding among all the recent studies of interest inventories was the 
publication, late in 1943, of Strong’s Vocational Interests of Men and 
Women (110). This work surveys vocational-interest testing from its 
beginnings to the present. The book covers the important materials on 
the Strong Test in so thoro a manner as to preclude adequate considera- 
tion in the space allotted here. Fortunately, Super (114) has already pub- 
lished an excellent comprehensive review. Suffice it to say that Strong’s 
treatment is characterized by excellent organization, lucidity of style, 
objectivity, and freedom from exaggerated claims—qualities not always 
found in the proponent of a specific testing procedure. 

Several of the findings detailed by Strong should be of special interest 
to educators. Thus, he shows that patterns of interests are already clear 
and stabilized enough at adolescence to serve as useful guides in vocational 
counseling; that there is a high degree of relationship between scholastic 
interests and graduation from a selected course, altho not between scholas- 
tic interests and grades in the course; and that there seems to be little 
difference in the teaching-interest scores of successful and unsuccessful 
teachers. 

Another very important review of interest testing that cannot, because 
of space limitations, be adequately reviewed here, is Carter’s (15) Voca- 
tional Interests and Job Orientation. This brief but comprehensive ten- 
year survey of the field emphasizes several significant points, including the 
contention that the measurement of vocational interests by means of 
modern inventory technics is about as reliable as the measurement of 
intelligence by means of group tests (15, p. 68). Like Strong’s book, 
Carter’s monograph is must reading. 

Two other noteworthy surveys of vocational interest tests, by Berdie 
(4) and Hahn (48), appeared during the last three years. Berdie con- 
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cluded that vocational interests arise not from one main factor, such as 
ability, schooling, or family influences, but from a multiplicity of almost 
all possible conditions. Hahn showed that norms of the Kuder Test are 
still inadequate, and that validity is more assumed than proved. Hahn did, 
however, find unusual promise in terms of data now being processed by the 
test’s author. 


New and Revised Instruments 


Several new or revised interest inventories appeared during the last 
three years, some of them especially adapted for school use. Dunkel (30), 
for example, published a report of an Inventory of Students’ General Goals 
in Life. Horrocks (59) experimented with an interest-in-subject test, which 
proved to have reasonable validity when used with high-school and junior- 
high-school students. Jones (61) put out the JCW Interest Record for 
use with children. Barry (2) published some Kuder Preference Record 
norms based on measurements made on 1500 high-school seniors. Cleeton 
(16) brought out a revised edition of the Cleeton Vocational Interest In- 
ventory, Form A. 


In the occupational field, Brainard and Brainard (11) published an 
Occupational Preference Inventory. Larus (71) brought out a Vocational 
Preference Index. Lee and Thorpe (73) constructed an Occupational In- 
terest Inventory. Older (92) and Super and Haddad (115) reported on 
the Super-Older Vocational Interest Test. This last instrument departs 
somewhat from the conventional interest inventories, in that the subject 
is asked to answer a set of true-false questions based on films of occupa- 
tional activities. Older reports fair agreement between test scores and ex- 
pressed occupational preferences. It would appear that more research might 
well be done with interest tests of this type. 


Reliability and Validity Evaluations 


A fair amount of work was done during the last triennial period on the 
evaluation of interest inventories. Hartson (57) made a follow-up study of 
the Oberlin Vocational Interest Inquiry fourteen years after its original 
use, and found a correlation of .72 between the scores of twelve subjects 
tested on the two different occasions. Triggs (118) studied the relation of 
Kuder Preference Record scores to other measures, and found them to be 
reliable enough for use in counseling individuals. She also found a fair 
amount of agreement between interests as measured by the group scales of 
the Strong Vocational Interest Blank for Men and scales of the Kuder. 
Wittenborn, Triggs, and Feder (124) compared scores on the Strong 
and Kuder blanks and found agreements in some scales and disagreements 
in others. Triggs (119) again compared Strong and Kuder scores and 
found reasonable agreement. Thompson (116) found some degree of cor- 
relation between Kuder scores and the success of dental-school students. ° 
Bolanovich and Goodman (10), on the other hand, discovered low cor- 
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relations between Kuder scores and final grade averages of sixty-six RCA 
Cadettes enrolled in a training program for electrical engineering aides. 


Construction and Scoring Technics 


In the field of construction and scoring technics, the controversy between 
the proponents of simple and complex weightings of item responses con. 
tinued. A paper by Strong (111) on weighted versus unit scales concluded 
that unit scale scores, if employed with the Strong Vocational Interest 
Blank, would “lead to different counseling from weighted scale scores in 
from one-sixth to one-twelfth of the cases” (111 p. 215). 

Kuder also published a paper (70) answering an attack on the method 
of classification of items in the Kuder Preference Record. 

A special scoring key for the Kuder Preference Record was devised by 
the Staff of the Personnel Research Section of the Adjutant General’s Office 
(106) for use in assigning enlisted men to recruiting functions in the army. 
Dunlap and Harper (31) presented a method for making profiles by an 
interest-area method, for use with the Strong Vocational Interest Blank. 


Applications of Interest Inventories: Educational Appraisal 
and Guidance 


As usual, the last three years saw the publication of a good many studies 
in which interest inventories were applied for research and experimental 
purposes. Some of these applications were especially noteworthy in the 
area of educational appraisal and guidance. 

Barrett (1) tested art majors and control subjects on the Strong Scale 
for artists, and found that high scores on the test were more often than not 
associated with successful specialization in art. Berdie (5) gave the Strong 
Interest Blank to engineering students, and discovered that neither aca- 
demic achievement nor the amount of satisfaction expressed by a student 
in his course can be predicted by his score on it. Crider (22) administered 
the Strong Blank to nursing students and found that as a selective or a 
prognostic device it did not prove to be significant. Detchen (26) gave the 
Social Science Interest Test, as well as a comprehensive examination in the 
social sciences, to a group of college students, and found a correlation of 
.78 between their scores on both tests. Klugman (65) used the Strong 
Blank for Women on vocational-high-school students and found that those 
having more permanent clerical interests were not superior to those having 
less permanent interests. He also noted (66) that a year of schooling had 
no tendency to improve the negligible relationship existing between clerical 
interest and aptitude test scores. Lorimer (82) reported on the use of the 
Strong Blank on Columbia College students. In a follow-up study of 241 
students who were advised to enter certain occupations partly on the basis 
of Strong Test results, she found that 82 percent of the group were success- 
fully and happily engaged in those occupations. Roberts (97), using the 
Wonderlic Personnel Test and Kuder Preference Record on graduate engi- 
neers, found them to have a strong distaste for clerical work, and noted 
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that, if these findings were generally true, engineering colleges might well 
consider measures to diminish the emotional resistance to clerical work 
needed in engineering. Roeber and Garfield (98) administered a vocational 
preference inventory to 1955 ninth- to twelfth-grade students, and found 
that in general the most favored occupations were accorded much the same 
rank among the different grade levels. Long (81) found a relationship 
between Strong Test scores and Zyve Scientific Aptitude Test scores for 
college students. 


Occupational Guidance and Selection 


As would be expected, several recent studies applying interest inven- 
tories to occupational areas were reported in the literature. Berdie (6) 
constructed an interest scale of twenty-two items for use on successful and 
unsuccessful marine recruits. He found the critical ratio of the difference 
between the mean scores of these two groups to be 9.7 and concluded that 
“beyond all doubt the scale differentiates between the two groups” (6 p. 
280). Hahn and Williams (49) experimented with the Kuder Preference 
Record on Marine Corps Women Reservists and found that with the use 
of this test three groups of clerical workers—stenographers, clerk-typists, 
and general clerks—could be successfully divided into satisfied and dis- 
satisfied workers. Lehman (75) gave the Kuder to three kinds of home 
economists—teachers, hospital dieticians, and business women. She noted 
that there seemed to be distinct differences among the three groups. Strong 
(112) investigated the interests of forest service men with his own Voca- 
tional Interest Blank and found that, in general, they have interests similar 
to those of skilled tradesmen, particularly farmers, of production managers, 
of engineers, and of public administrators. Strong (113) also investigated 
the interests of senior and junior public administrators, and found them 
to differ somewhat—enough to suggest that a fourth to a third of the 
juniors did not have the interests of the senior administrators. Uhrbrock 
(121) submitted five sets of very comprehensive interest questionnaires to 
242 employees, and presented percent norms for his group. 


Studies of the Nature and Dynamics of Personality 


Altho most interest inventories are designed primarily for vocational 
selection, they are occasionally employed for clinical purposes and for 
studies of personality. Several such uses were reported in the literature of 
the last three years. Harris (54) used a play activity inventory in a study 
of delinquent boys and found it to be both feasible and rewarding. He 
discovered that certain leisure-time interests of delinquent boys are closely 
associated with their behavior. Jones and others (60), in their compre- 
hensive study of an adolescent boy, utilized the ICW Interest Record, the 
Strong Vocational Interest Blank, and the Lehman and Witty Play Quiz. 
Tyler (120), employing the Strong Blank as well as the Minnesota Per- 
sonality Test, found a relationship between social adjustment and Strong 
scores. Berdie (7) used an. interest test and the Multiphasic Personality 
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Inventory on university counseling bureau cases, and found a relationship 
between range of interests and scores on five of the Multiphasic Scales. 


Summary 


From the amount of research activity on interest inventories during the 
last three years, it may be concluded that these tests are considered to be of 
real importance by a large group of educators and psychologists. That their 
confidence in the interest inventory is not misplaced is at least partly 
proved by the fact that, in the period under consideration, the majority of 
studies have been favorable. However, there are still enough negative and 
on-the-fence indications to show that much remains to be done in establish. 
ing high predictive validity for the most popular of the inventories now 
in use. Users of the tests, in both educational and vocational situations, 
must still be warned to be extremely cautious in regard to individual 
diagnosis and prediction. 


Attitudes, Opinions, and Morale 


The present section continues and extends the previous reviews by Trax- 


ler (117), Darley and Anderson (24), and Murra (90). 


General Methodology 


McNemar’s lengthy review of Opinion-Attitude Methodology (87) re- 
quires special mention. McNemar indicated: (a) that attitude scales and 
single-question opinion technics, respectively, permit only a rank-ordering 
(rather than the precise quantitative measurement) of individuals and of 
groups; (b) that both attitude testers and opinion gaugers are too often 
content with low degrees of reliability; (c) that internal consistency offers 
a criterion of reliability rather than of validity; (d) that opinion pollers 
validate in terms of group voting rather than in terms of individual be- 
havior; (e) that some attitude testers have denied that there is any validity 
problem because, they contend, the verbal expression of attitude has its 
own intrinsic validity; and (f) that attitude and opinion testers too often 
combine dissimilar functions into what they assume to be a meaningful 
whole, instead of developing uni-dimensional scales. Replies to certain of 
McNemar’s critical comments appear in the November 1946 issue of the 
Psychological Bulletin. 


Measurement technics—The most widely used technics are the equal- 
appearing interval method of Thurstone, the internal consistency or sum- 
mated ratings method of Likert, and the interview method. Less widely 
used are methods involving self-ratings, between-group differences, paired 
or “forced choice” statements, verbally stated situations, projective tech- 
nics, and the scalogram (47). 

The equal-appearing interval method was studied by Farnsworth, who 
used a prejudging technic for evaluating scale intervals (38), found con- 
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siderable merit in the Allport graphic method of scaling (39), and dis- 
covered appreciable shifts in item values with the Seashore-Hevner sorting 
method when two groups of judges adopted extremely differing attitudes 
by request (40). Edwards (33) studied the neutral items of Thurstone 
scales, while Edwards and Kenney (34) compared the Thurstone and 
summated-ratings or Likert technics. Eisenberg (35) compared two meth- 
ods of scoring results on a like, indifferent, and dislike response pattern. 

Interviewing technics and patterns have received much attention from 
public opinion pollers (12, 44). A study in depth interviewing by Link 
(78) developed a technic which was objective, in the sense of not depend- 
ing upon the characteristics of the interviewer; the results were also readily 
subject to tabulation and analysis. The projective method was used by 
Proshansky (96), who made exploratory use of Murray’s Thematic Apper- 
ception Test. 

Scaling technics less widely used than the Thurstone are a simple non- 
mathematical “scalogram” method reported by Guttman (47), and a tabu- 
lation method studied by Goodenough (45); both methods were used by 
the army in its studies of morale. McCormick (85) suggested the substi- 
tution of a simple chi-square modification for the common scale technics in 
analysis of results. 


Validity and reliability—The dearth of studies dealing with the trouble- 
some problem of validity is particularly noticeable. Blankenship (9), 
Gallup (44), and Katz (62) studied validity by means of the agreement 
between poll results and election returns. Connelly (17) studied the validity 
both of predicting election returns and of predicting turnout of voters. 

Reliability studies, apart from those which deal with the accuracy of 
scaling, appear to be concerned mainly with agreement of results by 
different polls, with interview technics, with the framing of questions, and 
with sampling adequacy. Cantril (12) compared results from four public 
opinion polls, finding satisfactory agreement. Dodd (29) re-asked ques- 
tions and King (64) compared results of two interviewers in their studies 
of reliability. Eysenck (37) compared several factor analysis studies of 
social attitude. The controlled sample technics of the leading American 
polls were examined by Connelly (17), while Hansen and Hauser (51) 
described the basic principles of the area-sampling or pinpoint method. 
Benson, Young, and Syze (3) experimented with the area-sampling method 
and the secret ballot. Lazarsfeld and his associates (72) made an experi- 
mental study of a panel sampling method over a period of some months. 


Measurement of Attitudes 


Students of attitudes have expanded into the related areas now known 
as opinions and morale. College students and, to some degree, high-school 
students remain the principal subjects, altho some tendency may be noted 
to study younger children and adult groups. Conrad and Sanford (19) | 
pointed out the practical and theoretical advantages of college samples; 
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McNemar (87), however, favored the use of cross sections of the general 
population (with some restriction as to age). 


New attitude instruments and technics—Altho many scales for the 
measurement of attitudes were developed, only a few were the result of 
research directed primarily toward that result. Hanchett (50) reported 
pretesting results for the first two of what is intended to be a nine-scale 
set for measuring attitudes toward the British. A Likert-type scale measur. 
ing anti-Semitism, which was developed by Levinson and Sanford (77), 
was studied in relation to a variety of personality factors. Marks (83) 
combined Thurstone and Likert technics in producing scales for testing 
attitudes of Negro youth toward both whites and Negroes. Ferguson (42) 
revised his scales of primary social attitudes. 

Scales, questionnaires, checklists, and other devices were constructed 
as means toward the attainment of other ends in many studies. Altho scales 
(18, 20, 23, 28, 36, 52, 104) were the most common, questionnaires and 
checklists (21, 89, 93, 94, 101, 107) and reports of imaginary or fictitious 
situations (84, 102) were also evolved. 


Studies of attitudinal status—So many investigations of group attitudes 
have been made that it is possible here only to indicate their scope as to 
groups of persons studied and issues toward which attitudes were measured. 
The easily accessible college student was the subject of many investigations. 
Morgan (89) reported on the attitudes of college students toward the 
Japanese. Attitudes of college students toward such issues as student honors, 
vocations, intercollegiate activities, and the United States Constitution were 
surveyed by Knode (67), while Seward and Silvers (102) studied the 
attitudes of college women toward accuracy in newspaper reports. 

Sanford and associates (100, p. 323) studied the relation of “sentiments” 
to behavior and fantasy. Duvall and Motz (32) studied the attitudes of 
girls and young women toward family living. Dinkel (28) reported on the 
attitudes of high-school and college students toward supporting aged par- 
ents, and Smith (104) dealt with the attitudes of children, adolescents, and 
adults toward Soviet Russia. Legislators were surveyed by Hartmann (56), 
and both congressmen and administrators at the policy-making level were 
studied by Kriesberg (69) with respect to their judgments concerning 
public opinion polls. Le Gulf and Hopkins (74) measured the attitudes of 
British propagandist society members toward social and political issues. 
Other studies of special groups were those of Moreton (88) on the attitudes 
of teachers and pupils toward coeducation, of Patrick (93) on the attitudes 
toward women executives in government, and of Stagner (107) on the 
opinions of psychologists with respect to peace planning. 

Studies of attitudinal trends—Studies of attitude changes over long 
periods of time, and of changes presumably resulting from certain learning 
situations, appear to be less numerous than are status studies. Pressey (95) 
noted less concern in 1943 than in 1923 on the part of school and college 
students from grades six to sixteen with regard to social taboos, inhibitions, 
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and fears. Lentz (76) and Stagner (108) studied attitudinal changes of 
college men and young adults, respectively, from prewar to war years on 
issues of war and aggression. Blake and Dennis (8) investigated the de- 
velopments of stereotypes concerning the Negro. 

Calling intraception-extraception “a basic attitude underlying action,” 
Sanford and associates (100, p. 643) concluded that between the ages of 
five and fifteen intraception follows a U-shaped course of development, 
being most pronounced for the youngest and oldest subjects. 

The effects of a motion picture having a Nazi theme upon high-school 
pupils’ attitudes were investigated by Wiese and Cole (122), thru the use 
of oral and written reports. Using several Thurstone scales, Smith (105) 
found greater homogeneity of attitudes among college students after the 
study of sociology than before. Di Michael (27) studied changes in teach- 
ers’ attitudes toward pupil behavior as a result of taking mental hygiene 
and educational guidance courses. 


Correlates and effects of attitudes—The correlates of attitudes which 
have received most attention appear to be those commonly studied under 
the psychological heading of individual differences, altho attitude testers 
have also gone considerably farther afield in the selection of some variables. 
Fewer recent studies have dealt with such relationships as that between 
information and attitude, or with the effects of attitudes upon learning. 

Crespi (20) used a social rejection thermometer similar to the Bogardus 
scale of social distance in studying the correlates of college students’ atti- 
tudes toward conscientious objectors. McGranahan (86) surveyed differ- 
ences between American and German youth. Ferguson (41) investigated 
sex differences of college students in certain social attitudes, while Gund- 
lach (46) investigated regional differences in the evaluation by college 
students of enemy, ally, and domestic national groups. Attitudinal differ- 
ences among college students of three religious faiths were studied by 
Sappenfield (101). Kerr (63) surveyed the literature with respect to the 
liberalism-conservatism continuum on political and economic issues and 
drew conclusions concerning correlates under such headings as age, sex, 
race, occupation, religion, intelligence, and education. 

Newcomb (91) concluded that both the “attitude climate” and informa- 
tion acquired and retained on a recent social issue are a result of the indi- 
vidual’s mode of adjustment to the community. Cantril (14) studied the 
relationship between intensity and direction of attitudes toward the Negro 
and toward government regulation of business. The influences of attitudes 
upon reading interpretations of high-school pupils were studied by McCaul 
(84). Perry (94) investigated the influence of student dreads upon their 
attitudes toward school subjects. 


Measurement of Opinions 


The verbal expressions of attitudes often defined as opinions are: usually: 
studied by single questions or by short series of related questions. The 
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data are ordinarily collected by the interview method. Public opinion polls, 
conducted largely by journalists, have received acceptance as the major 
source of opinion research. Findings from studies of public opinion appear 
most often in newspapers and nontechnical periodicals, altho the Public 
Opinion Quarterly and other professional journals often carry summariza. 
tions of poll results. 

Three extensive descriptions of opinion polling methodology are worthy 
of note: by Gallup (44) on the American Institute of Public Opinion, by 
Cantril (13) on the Office of Public Opinion Research, and by Blankenship 
(9) on consumer and opinion research. Other polling organizations are 
the National Opinion Research Center, the Fortune Survey, the Crossley 
Poll, and (at the secondary-school level) the Scholastic Poll and the Purdue 
University Poll. Representative both of the methods and the findings of 
the Index of Public Opinion of the Psychological Corporation are the 
reports by Link (79, 80). Skott (103) discussed attitude research technics 
of the Department of Agriculture, Woodward (125) discussed problems 
encountered and methods used in government research on attitudes, and 
Ferraby (43) presented the “Mass Observation” procedures of an English 
polling organization. 

Crespi (21) conducted a poll of attitudes toward conscientious objectors; 
the attitudes, as measured, were much more favorable than might have been 
anticipated. Williams (123) surveyed regional differences in opinions con- 
cerning international cooperation. Davenport (25) advocated the local, 
systematic polling of high-school students, and use of the results as a 
“guide to guidance.” 


Measurement of Morale 


Scientific studies in that attitudinal area now known as morale, using 
technics very’ similar to those of attitude studies, have increased tre- 
mendously in significance as a result of recent world events. The estab- 
lishment of the Morale Services Division of the Army Service Forces, and 
later of the Information and Education Division, U. S. War Department, 
bear out this fact. 

Defining optimism as one aspect of morale, Conrad and Sanford (18) 
developed three scales for the measurement of war optimism—one on mili- 
tary optimism, one on optimism concerning consequences of the war, and 
one on general or personal optimism. Estes and Estes (36) standardized 
eleven miniature scales of war morale. Conrad and Sanford (19) studied 
several aspects of war optimism among college students. The war morale 
of rural adolescents and their parents was investigated by Stott (109). 
Cronbach (23) surveyed the general and personal optimism of high-schoo! 

upils. 
: Sanford and Conrad (99) intensively studied one case each of high and 
low national morale, while Henderson and Tinnes (58) surveyed the 
national morale of high-school pupils, college students, and adults. Cantril 
(13) dealt extensively with the measurement of civilian morale. 
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Harding developed two value-type instruments, a generalizations test 
(52) and a problemmaire (53), for which he selected content from philoso- 
phy, social psychology, and sociology. Hart (55) developed a value judg- 
ment scale of happiness-unhappiness, intended to be more valid than the 
monetary scale for rating human experiences and achievements. Korn- 
hauser (68), surveying employee morale methodology, discussed several 
types of interviewing and questionnaire technics, raised some critical ques- 
tions, pointed out general difficulties, and outlined the analytical methods 


available for morale studies. 
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CHAPTER VI 
Rorschach Methods and Other Projective Technics 


MARGUERITE R. HERTZ, ALBERT ELLIS, and PERCIVAL M. SYMONDS 


D caine the three years since Symonds and Krugman (159) last reviewed 
research in projective technics for the February 1944 issue of this Review, 
there has been no diminution in the interest of psychologists and educators 
in these testing methods. Even a period of the gravest international political 
and economic developments could not apparently dampen the ardor of 
researchers. 

The present review follows the pattern set by the 1941 and 1944 surveys 
in this Review, except that the Rorschach Test is now covered in a separate 
section. 


Rorschach Methods 
General 


A number of noteworthy texts have appeared. Beck’s (12, 13) two 
volumes include descriptions of scoring categories, scoring problems and 
examples, discussion of psychological meanings of categories, and forty- 
three illustrative records covering a variety of personality pictures. Two 
volumes on Diagnostic Testing by Rapaport, Gill, and Schafer (117. 
118) aim to present the theory, statistical evaluation, and diagnostic appli- 
cation of a battery of tests employed at the Menninger Clinic. Considerable 
space is devoted to the Rorschach technic. Bockner and Halpern (18) have 
published a revised edition of their book, and Klopfer and Davidson (80) 
have added a supplement to the Klopfer-Kelley manual. 

In two recent surveys of psychologists’ opinions (Kornhauser, 82, and 
Faterson and Klopfer, 39) , a majority indicated that the Rorschach Method 
has a definite place in the field of general psychology, and that it has 
clinical value if used by trained persons; but vigorous statements were 
also made in terms of lack of objectivity, reliance on personal norms and 
subjective evaluation, lack of validation, limited clinical application, and 
“cultism.” 

Replying to various criticisms of the Rorschach Method (such as the 
lack of objectivity), Munroe (98,99) formulated and analyzed the method 
as a dynamic technic, and emphasized the need for a fairer perspective 
and for more appropriate standards of value. 


Methodological Problems 


In the last three years there has been less research on the objective and 
standardized approach, and more application of the method in various 
fields. Some advances can be reported, however. More efficient methods 
for recording responses by use of code systems have been advanced (Beck. 
12; Hertz, 62), and a revised psychogram for summarizing Rorschach 
data has been published (Hertz, 65). Scoring of the various test factors 
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is treated in detail in the new volumes mentioned above. Hertz (62) has 
published a revised and amplified edition of her Frequency Tables, con- 
centrating on form accuracy, but including also code charts for locating 
responses, and lists of normal details and of popular and original responses. 
Scoring criteria and other objective data for children have been presented 
by Vorhaus (164) and by Hertz and Ebert (66). A new proposal for ap- 
praising the form level by means of rating scales has been published by 
Klopfer and Davidson (80), who expand the term form level to include 
three form qualities, accuracy, specification, and organization. This last is 
separately handled by Beck with his “Z” factor and Hertz with her “G.” 
Goldfarb (47) presented the only systematic study of organization activity, 
comparing Beck’s “Z,” Beck’s “Z” applied only to F-+ responses, Gold- 
farb’s revision of the Klopfer-Davidson form-level scoring, and four tests 
of abstract ability. None of the correlations computed were significant. 

Schachtel has contributed two valuable theoretical papers; one (143) on 
the dynamic relationships among color, feeling, emotion, and affect; the 
other (144) on the significance of the subject’s definition of the Rorschach 
situation in terms of personal and cultural patterns, which determines his 
attitudes and which affect his performance. 

Problems associated with the popular response factor were considered 
by Hallowell (52), based on his analysis of frequencies of responses in a 
group of American Indian subjects. The psychometric scales for scoring 
Rorschach responses offered by Zubin, Chute, and Veniar (174) provide 
for more exact quantification of the Rorschach Method. The comparative 
merits of this technic and the traditional method remain to be established. 

A more detailed analysis of content of the Rorschach responses has been 
advocated in the last few years. Rapaport and others (118) have attempted 
in their book to systematize conspicuous verbalizations, and to explain the 
psychological processes leading to deviant ones. Interest has been focused 
on specific kinds of content by Goldfarb (48), who emphasized the psycho- 
logical significance of the animal symbol; and by Goldstein and Rothman 
(50), who called attention to the factor of physiognomic attitude as ex- 
pressed in Rorschach responses. 


Norms 


The need for standards of comparison has inspired investigators to 
amass norms for various age groups, mental levels, developmental levels, 
and for different cultures. Normative data are included in the manuals of 
Beck (12) and of Rapaport and others (118) for groups of different mental 
level, of varying personality pictures, and for various diagnostic groupings. 
Several studies include norms for preschool children (Swift, 156), school 
children (Kay and Vorhaus, 78, Vorhaus, 164), for superior seven-year-old 
children (Gair, 42), and for six- and eight-year-old children (Hertz and 
Ebert, 66), junior-high-school boys and male college students (Hertzman 
and Margulies, 67), and superior boys and girls (Davidson, 30). Hallo- 
well (52) presents norms for other cultures. 


79 











Review OF EDUCATIONAL RESEARCH Vol. XVII, No. ] 





Unfortunately research in the establishment of norms has of necessity 
been sidetracked by more immediate large-scale problems. There are stil] 
serious omissions for certain age-groups and for certain personality pic- 
tures. While many examiners claim success in proceeding without them, 
one achieves greater precision in interpretation when it is possible to apply 
norms appropriate to the subject. 


Reliability 


There have been few developments in establishing the reliability of the 
Rorschach Method in the last three years. Fosberg’s early study demon- 
strating the high test-retest reliability has been elaborated by a subsequent 
study (40) on how subjects tried to fake results. Even with “test-wise” 
subjects, fundamental Rorschach patterns were little altered. He concluded 
that certainly “test-naive” subjects could not influence the method. 

Swift (155), working with forty-one preschool children, determined 
reliability of the various scoring categories over various time-intervals. 
The results were offered to justify the clinical use of the Rorschach Method 
as a reliable technic. 

While no other systematic studies have appeared, it should be noted in 
the clinical studies discussed later, where the Rorschach Test is repeated 
under experimentally varied conditions, that the stability of the method is 
indicated. : 


Validity 


A few studies have attacked the problem of validity directly. Many, 
however, utilizing the Rorschach Method for other purposes, have indi- 
cated its validity. 

Studies where the Rorschach Test is given under experimentally altered 
conditions demonstrate the extreme sensitivity of the method to changing 
conditions or attitudes or emotional states, and furnish experimental evi- 
dence of its validity. Thus Stainbrook (150), using a modified form of 
the Rorschach presentation, assembled composite Rorschach psychograms 
for each five-minute interval following the onset of an electroshock con- 
vulsion and demonstrated progressive changes in Rorschach results. Morris 
(96) reported that reliable changes in pre- and post-treatment records 
paralleled the clinical improvement. Again, Rorschach studies made on 
subjects smoking marihuana cigarettes (Williams and others, 171) before 
and after medication indicated changes in patterns which could be verified 
by other technics and by clinical observations. Levine, Grassi, and Gerson 
(87, 88), using the verbal and graphic Rorschach, demonstrated the sensi- 
tivity of the test to mood-changes induced, under hypnosis, by the use of 
emotionally vivid suggestions. 

In comparing Rorschach results with outside criteria, some few studies 
use correlational procedure; others, the matching technic. Still others are 
content with demonstrating general correspondence. Swift’s study (154), 
designed to investigate the correspondence between Rorschach measures of 
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insecurity (in terms of ratings and “signs” based on Rorschach records) 
and behavioral measures (obtained from a teacher’s rating scale and 
parent interviews) yielded generally negative results. Greater success was 
obtained, however, in another study (153) in the matching of Rorschach 
analyses of thirty preschool children and teachers’ descriptions. Waehner 
(165) matched analyses of the spontaneous drawings and paintings of 
fifty-five college students with Rorschach interpretations, showing correct 
matching in 87 percent of the cases. 

Innumerable studies of validation are based on comparisons of con- 
trasted groups of varying age, intelligence, background, school achieve- 
ment, of different race or nationality, of deviated personality, and of 
various kinds of mental disorders. Many of these utilize the method of 
equating groups for various factors. In the last three years, comparative 
group studies have included: 


preschool children 
loved, not loved, pseudo-loved (Schachtel and Levi, 142) 
school children 


high average, six and eight years of age (Hertz and Ebert, 66) 
non-reading children and clinic children (Vorhaus, 164) 
retarded, good, superior readers (Gann, 43) 

superior children, nine thru twelve years of age (Davidson, 30) 
adjusted and maladjusted children (Davidson, 30; Gair, 42) 
children with tics (Piotrowski, 112) 


stutterers and non-stutterers (Krugman, 83; Meltzer, 94; Richardson, 122) 
adolescents 
“institution” and “foster home” (Goldfarb, 46; Goldfarb and Klopfer, 49) 
junior-high-school boys (Hertzman and Margulies, 67) 
college students 


achieving and non-achieving college men (Steinzor, 152) 
male students (Hertzman and Margulies, 67) 


adults 
Kansas highway patrolmen (Rapaport and others, 118) 


mechanical workers, outstanding and non-outstanding (Piotrowski and others, 114) 
malingerers (Benton, 16) 


sociological groups 
Spanish and English refugee children (Tulchin and Levy, 162) 


Many outstanding contributions deserve special mention. Hertzman and 
Margulies (67) showed reliable developmental changes in personality by 
comparing equated groups of junior-high-school boys with male college 
students. In a study of personality in relation to the economic background 
of intellectually superior children, Davidson (30) found that despite the 
uniformly high intelligence ratings, the group revealed a wide disparity 
in personality patterns. Bright children tended to be well adjusted, but. 
more often in an introverted than an extroverted way. Little relationship 
was observed between socio-economic status and general personality ad- 
justment. 


Gann (43) compared groups of retarded readers with equated groups } 
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of average and good readers. The Rorschach study revealed more unfavor. 
able signs of adjustment in the personality of retarded readers than in the 
other two groups. Vorhaus (163) developed her thesis that non-readers are 
characterized by higher resistance. 

In Steinzor’s (152) study, the Rorschach Method distinguished between 
achieving and non-achieving groups of college men of high ability, the 
non-achieving group showing fewer signs of adjustment. 

Statistically reliable personality differences between stuttering and non- 
stuttering children were demonstrated on the Rorschach by Krugman 
(83), Meltzer (94), and Richardson (122). 

Goldfarb (46) compared two equated adolescent groups, one whose 
years of infancy had been spent in an institution, the other whose life 
experience had been in foster homes. Rorschach results clearly differen- 
tiated the “institution” children from the “foster home” group, the former 
being more passive and apathetic, less mature, less controlled, less differen. 
tiated, less ambitious, and less capable of adjustment related to conscious 
intention or goal. Rorschach results verified experimental and clinical 
findings of other studies, and in turn, could be considered verified by 
them. Again, equating fifteen institution children with a similar group 
ef foster home children, Goldfarb and Klopfer (49) showed that early 
deprivation was associated with personality fixation on a primitive level, 
independent of intelligence. 

In addition to the above, mentally deficient and mentally disordered 
groups of all kinds have been compared. A limited selection of references 
includes: 


mental deficiency 
brain-injured and non-brain-injured (Werner, 167, 168) 
children of low mental development but with different school success (Abel, 2) 
subnormal Negro and white institutionalized adolescents (Abel, Piotrowski, and 
Stone, 3) 
mental disorders 


neurotics (Rapaport and others, 118; Piotrowski, 111; Ross and McNaughton, 129; 
Koff, 81) 

preschizophrenics (Rapaport and others, 118) 

incipient schizophrenics (Piotrowski, 111) 

paranoid conditions (Rapaport and others, 118) 

obsessive adolescents (Goldfarb, 45) 

patients with migraine headaches (Ross and McNaughton, 129) 


Abel (2) compared two groups of subnormal girls, differentiated on the 
basis of academic school success. Marked differences were observed in 
Rorschach responses, the higher educational group showing better person- 
ality integration than the lower. 

An outstanding contribution was made by Goldfarb (45) in his detailed 
study of twenty adolescents showing obsessional trends in terms of 
Rorschach patterns and qualitative aspects of the Rorschach record. Equat- 
ing the obsessional adolescents with a similar group of unselected children 
referred for educational guidance, he identified eight reliable symptomatic 
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personality trends in obsessional adolescents. Rorschach results in con- 
junction with case study, clinical observations, interviews, and other test 
data enabled him to present a valuable picture of the dynamic personality 
structure of the obsessional adolescent. 

The trend to establish “signs” which are more frequent in one group 
than in controlled or contrasting groups has continued in the last few years, 
and attempts have been made to establish statistically the extent of their 
diagnostic usefulness. Ross and Ross (130) combined and weighted several 
signs occurring more often in “neurotic” and “organic” subjects than in 
controls, thus obtaining a general “instability” rating and a general “dis- 
ability” rating, which were validated with clinical findings and with se- 
lected subtests of the Binet. The authors reported that these ratings differ- 
entiate groups reliably. 

The “sign” procedures utilized in diagnosing schizophrenia, designated 
as “pathognomic” and “tabular,” were criticized by Piotrowski (111) be- 
cause they lay insufficient stress on the systematic, dynamic, and mutual 
interdependency of Rorschach components. 

Both Davidson (30) and Gann (43) have developed reliable batteries 
of “signs” for evaluating good adjustment in school children, which they 
applied with success in their respective studies. Piotrowski and others (114) 
identified specific Rorschach signs which, in the sample studied, discrimi- 
nated between outstanding and non-outstanding mechanical workers. 

Unfortunately the use of signs has sometimes been abused. Too often 
control and contrasting groups have not been utilized. Many of the “signs” 
require more extensive study and must be verified by application to new 
and larger groups. 

Validation continues also in terms of studies which demonstrate a high 
degree of correspondence between Rorschach analyses and other criteria, 
such as case records, test data, teachers’ reports, psychiatric diagnoses, 
various clinical data, and results from other projective technics; many 
of these studies utilize the blind-interpretation technic. Thus Schachtel 
(141) showed close correspondence between Rorschach records obtained 
from the same children at different ages and other projective data and 
behavior records. Munroe, Lewinsohn, and Waehner (104) showed good 
agreement between clinical observations and results of three projective 
methods, the Rorschach, graphological analysis, and art technic. Using 
various personality tests, including the Rorschach, Michael and Buhler 
(95) validated results against psychiatric diagnoses. 

Again, objective validation of the method is seen in DuBois’ (34) 
blind analyses of records of the people of Alor, which corresponded to the 
descriptions offered independently by the ethnographer who lived among 
them. 

The literature is replete with individual case studies which demonstrate 
the close correspondence between Rorschach interpretations and validating 
material from non-Rorschach sources. The new manuals contain many 
such case studies. 
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Finally, studies of the method as an instrument of prediction offer prob. 
ably the best method of validation. Munroe (100, 101, 102) has contributed 
immeasurably in this direction by her studies of Rorschach results from the 
freshman class at Sarah Lawrence College. The Rorschach findings were 
compared with independent criteria, such as academic failure, referrals 
to psychiatrist, and problem behavior observed by teachers. Ample evi- 
dence was reported of the high degree of success in predicting the criteria. 
In addition, the shock treatment studies continue to demonstrate the prog. 
nostic power of the method (Morris, 96). 


Modifications and Supplementary Technics 


In the last few years there have been many modifications and extensions 
of the Rorschach Method. Harrower-Erickson and Steiner (56) have pub- 
lished their manual covering both the Group procedure and the Multiple 
Choice Technic. As already indicated in detail (61), lack of measures of 
reliability, lack of adequate validating material, inadequate norms, and 
the generally low scientific standards of research compel us to defer judg- 
ment as to the value of the Group Method even as a screening instrument. 
Tho Abel (1) has reported some success with Sender’s Group Rorschach 
Method in a vocational high school, and Stainbrook and Siegel (151) 
found a Group Method valuable in differentiating southern Negro and 
white high-school and college students, research on the Group Method has 
not yet followed thru to establish all phases of the method on a firm basis. 
Buckle and Cook described their development of the Group Method. 

Studies have yielded even less promising results for the Multiple Choice 
Test of Harrower-Erickson and Steiner than for the Group Method (Chall- 
man, 24; Due, Wright, and Wright, 35; Balinsky, 7; Jensen and Rotter, 
76; Malamud and Malamud, 91, 92; Wittson, Hunt, and Older, 172). 

Experiments with self-recording technics have been suggested by St. 
Clair (132) and Munroe (97), who conclude that they warrant further 
exploration. Other supplementary tests suggested to provide additional 
leads as to basic personality trends include the Free Association Test de- 
scribed by Janis and Janis (75), based on free associations to the 
Rorschach blots, and the Animal Association Test by Goldfarb (48), who 
would study the symbolic significance of animal responses in the 
Rorschach. Hutt and Shor (73) ‘have suggested extension of, and supple- 
mentary procedures for, the “testing-the-limits” phase of the Rorschach 
administration. 

Two parallel series of blots have been proposed: the “Psychodiagnostic 
Inkblots” by Harrower-Erickson and Steiner (57), which are presented 
without adequate standardization; and the Marseille Rorschach Mail 
Interview (93), for which no research is available, to the writer’s knowl- 


edge. 
Scope of Application 


As has been suggested, the use of the Rorschach is widespread, covering 
broad fields and a vast number and variety of problems in the last few 
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years. These have been surveyed in a recent paper by Hertz (64) on the 
significance of the Rorschach Method in the mental hygiene program. The 
application of the method in schools has been reviewed by Cowin (29), 
who emphasized specifically its role in clinical service, in screening those 
children who require study and treatment, in diagnostic study of the more 
seriously disturbed, in suggesting direction of treatment, and in evaluating 
results. 

In the field of vocational guidance and counseling, application of the 
Rorschach has increased. Within certain areas, it has been shown to reveal 
specific abilities, aptitudes, and talents. Prados (115), for example, identi- 
fied several common characteristics in a group of professional artists, and 
showed how the method could be used in studying the dynamics of artistic 
creativeness. The best use of the method, however, is in describing the 
kind of functioning personality an individual possesses, and revealing 
those traits of personality which help or hinder vocational adjustment. 
Thus Balinsky (8) was guided in his counseling in a public service em- 
ployment agency by Piotrowski’s (113) Rorschach formula for revealing 
traits of personality essential to educational and vocational success. 

The method has been used in anthropological and sociological studies 
with interesting results. Thus, differences between Negro and white groups 
have been reported by Stainbrook and Siegel (151), and by Abel, Piotrow- 
ski, and Stone (3). Tulchin and Levy (162) used the Rorschach Method 
to obtain a better understanding of the personalities of Spanish and English 
refugee children. Rorschach analyses are included in anthropological 
studies 34, 53, and 161. 

The application of the Rorschach Method in the social case-work field 
was considered by Schmid] (145, 146). Siegel (148, 149) described its 
use, by a social agency, in diagnostic procedure, in the formulation of 
treatment plans, and in selecting clients for group therapy and evaluating 
their response to it. Application of the Rorschach in a program of group 
therapy was also treated by Epstein and Apfeldorf (38). 

The most extensive application of the Rorschach Method has been, of 
course, in the psychopathological area. Beck (11), Rapaport and others 
(118), Koff (81), Michael and Buhler (95), and many others exhibit how 
extensively the method is used as an aid in differential diagnosis of mental 
deficiency, the neuroses, the psychoses, and intraorganic pathology. Hertz 
(63, 64), Kamman (77), Siegel (148), and others emphasized how the 
method is employed as a means of rapprochement to the patient, as an 
aid for determining the accessibility of the patient to treatment, as a 
therapeutic agent since it permits the patient to find emotional release, 
and as a guide to the kind of treatment best fitted to the particular 
individual. 

In passing, we may mention that the Rorschach has found use in,¢he 
armed forces for research, for diagnostic purposes, and for the objective . 
evaluation of therapeutic programs. 
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Conclusion 


This review of published reports on the Rorschach Method indicates 
the progress which has been made during the last three years in systematiz- 
ing research procedures, amassing scoring criteria and norms, using more 
scientific methods of handling data, adopting more adequate controls, 
employing statistical methods where they are applicable, and in applying 
scientific procedure to clinical validation. Today, the Rorschach represents 
one of the better methods for understanding the nature of personality, and 
is one of the more valuable instruments for use in clinical psychology. 

While much progress has been made, there are still numerous problems 
in need of futther exploration and verification. Unfortunately research has 
failed to keep pace with application and therapeutic usage. Standards of 
research have not always been kept at a high level. Dangerous trends have 
developed, not only in reduced emphasis on fundamental research, but 
in several other directions; namely, attempts to establish shorter forms 
of administration; attempts to over-simplify scoring and interpretation; 
premature utilization of group technics in advance of adequate validation; 
and the modification—really the emasculation—of the method to permit 
untrained persons to use it. These trends must, of course, be evaluated in 
terms of standards of wartime and of the chaotic years that followed. It is 
hoped that with the passing of the pressures of war and its aftermath, 
research will resume its former high standards, and that emphasis will 
again be placed on broad preparatory training in the method. The 
Rorschach Method cannot be effectively utilized by untrained personnel. 
Its efficient use requires training in the method, psychological and clinical 
knowledge, experience, skill, and the understanding of human problems. 
If workers in the field maintain high standards of research and application, 


the method will serve well the psychological and psychiatric needs of these 
postwar years. 


Other Projective Technics 
General Papers 


The most comprehensive, recent general study of projective technics 
is that by Sargent (140). She critically reviewed all the existing technics, 
and concluded that, while projective methods are not standardized, they 
truly deserve increased attention and exhaustive research. White (169) 
recently published a general survey and bibliographical review of imagina- 
tive productions, including sections on the Rorschach, the Thematic Apper- 
ception Test, story completion, play technics, and drawing and painting 
procedures. 

Cattell (22) published a paper dealing with the design of projective 
tests. His main point was that the term “projection” has been too cavalierly 
employed in many recent studies; and that, in consequence, the free asso- 
ciation and fantasy elicited by several so-called “projective” technics 
have little connection with projective interpretation of the situation. Cat- 
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tell’s paper should serve as a good antidote for an over-enthusiastic and 
lighthearted approach to the construction of projective tests. 

Several papers appeared which commented on the use of different kinds 
of projective technics in specific clinical situations, Sarason (137) sur- 
veyed the value of projective methods in cases of mental deficiency, and 
reported that they served to illuminate the “total personality” instead of 
merely isolated intellectual aspects of functioning. Hutt (72) showed spe- 
cifically how projective tests were employed in army medical installations. 
Holzberg (68, 69) wrote on the uses of projective technics in military 
clinical psychology. He warned against the limitations and dangers of 
projective tests when interpreted by untrained individuals, but granted 
their usefulness when properly employed. 

Several studies also appeared which employed two or more different 
projective technics in an attempt to bring out valuable experimental find- 
ings. Thus Murray and Morgan (106), in a clinical study of the sentiments 
of Harvard students, employed numerous psychological technics, including 
two forms of the Thematic Apperception Test (TAT), another picture 
selection test, and a sentence completion test. Despert (32) employed the 
Duss Fable Method as well as play and drawing materials in her psycho- 
somatic study of fifty stuttering children. Munroe (103) utilized. grapho- 
logical analysis, appraisal from spontaneous drawings, and the Rorschach 
in her special diagnostic study of one girl. Other studies were made by 
Munroe, Lewinsohn, and Waehner (104) and by Sanford and Cobb (133). 
It would seem that the multiple use of projective technics in research on 
personality is becoming more the rule than the exception. 


Thematic Apperception Test 


The Thematic Apperception Test remains (aside from the Rorschach) 
the most popular of the various projective technics. Considerable work was 
done during the past three-year period in regard to its construction, 
evaluation, and applications. 

In the field of construction, Murray (105) brought out the third revision 
of the original set of thematic pictures, as well as a revised and expanded 
manual for its administration and scoring. Combs (26) presented his own 
method of analysis for the TAT in terms of situations described, goals 
striven for, frustrations of these goals, and action patterns used by the 
individual for attempted resolutions. Clark (25) devised a method of ad- 
ministering and evaluating the TAT in group situations, and found a sub- 
stantial relationship between free responses and responses to a checklist of 
prepared stories. Rapaport, Gill, and Schafer (118), in the second volume 
of their work, reported a qualitative treatment of the TAT responses, and 
listed trends in responses that are diagnostically important for different 
kinds of clinical syndromes. Jacques (74) devised a rapid method of ana- 
lyzing TAT stories, which he tested with soldiers. Lasaga y Travieso and 
Martinez-Arango (85) published a series of suggestions regarding the - 
scoring and administration of the TAT, including several new technics. 
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Several experimental evaluations of the TAT were also reported in the 
literature during the three-year period under consideration. Bellak (14) 
designed a study in which subjects took the first five TAT pictures under 
normal conditions, and the second five while criticisms of their stories 
were being made. He concluded that “projection is in part a function of 
the stimulus” (14, p. 370). Loeblowitz-Lennard and Riessman (89) studied 
factors related to the recall of TAT pictures after they had been used in 
the standard procedure. They found that the recall description of a picture 
is a condensation of the story told in response to the picture, with the 
principal themes brought into sharp focus. Combs (27) has studied the 
“validity” of interpretations of autobiography and TAT material by 
comparing analyses made by different judges. Agreement between two 
analysts was from 50 to 60 percent; agreement of an analyst with himself 
at a later date 63 to 68 percent. It should be realized that the comparison 
of interpretations of the same material may differ from the comparison 
of projective materials obtained in independent case studies of an indi- 
vidual. Sarason (136), in a study of dreams and TAT scores, found that 
the major themes in his subjects’ dreams were generally the same as those 
in their TAT stories, and concluded that the validity of thematic interpre- 
tation was thereby demonstrated. Renaud (121) emphasized that fantasy 
is sensitive to age variations, and this must be taken into account in inter- 
pretation. Other studies evaluating the TAT were carried out by Harrison 
and Rotter (54) and by Kutash (84). Balken (9) summed up some of 
the recent studies on the TAT and found that they generally demonstrated 
it to be a valuable psychological technic. 

Applications of the TAT to clinical work and clinical studies were 
fairly numerous during the period under consideration. Richardson (122) 
found that it failed to distinguish between stutterers and non-stutterers 
in many major areas of personality. Balken and Van der Veer (10), on 
the other hand, found it helpful in the clinical study of neurotic children. 
White, Tompkins, and Alper (170) reported the TAT useful in a compre- 
hensive personality study of one subject. Sarason and Sarason (134, 135) 
found it very helpful in the diagnosis of feebleminded and mentally defici- 
ent children. Kendig (79) noted its value for diagnostic purposes as well 
as prognosis and therapy. 

In non-clinical applications, the TAT has not as yet come intu wide 
usage. Frenkel-Brunswik and Sanford (41), however, did make an inter- 
esting sociological application of it. In their study of personality factors 
in anti-Semitism they found that the thematic apperceptions of anti-Semitic 
girls brought out the ambivalent attitudes of the girls to parental figures, 
and helped explain the narrow, superego-ridden personalities of these sub- 
jects. Proshansky (116) also cleverly utilized the TAT to secure scores on 
attitudes toward labor for two groups of subjects, and found that these 
scores correlated .67 and .87 with # conventional attitude scale obtained 
by the questionnaire method. Further experimentation along these lines 
would seem at present desirable. 
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Other Picture Projective Tests 


Several other projective tests utilizing various forms of pictures have 
recently come into use, and experiments with them have been reported in 
the literature. Rosenzweig (126, 127, 128), in particular, has done a good 
deal of work on his Picture-Frustation Test. He has found the test to have 
some degree of reliability and validity, and has issued some norms and 
scoring samples. He freely admits, however, that the P-F Test does not 
reveal profound or extensive knowledge of human personality, since its 
modest scope limits it only to certain aspects of social adjustment. Symonds 
(158) has made a preliminary report of his test of forty-two pictures de- 
signed specifically for use with adolescents. He reported the pictures as 
differentiating on several counts between boys and girls, and between older 
and younger children. He concluded that the psychological themes revealed 
by the pictures “in a representative fashion tap the major psychological 
drives to be found in the fantasies of adolescents in our culture” (158, 
p. 328). Wekstein (166) designed and reported upon a picture test con- 
sisting of two sets of Disney-like figures, such as dwarfs, fairies, elves, 
nymphs, and ectomorphs. The purpose of having such innocently child- 
like figures, he stated, is to lull the subject into a ‘sense of security, encour- 
age him to identify himself with seemingly innocuous figures, and thus 
tap his innermost thoughts. Harrower and Grinker (55) and Chalke (23) 
reported validation experiments with the Harrower Stress Tolerance Test, 
which includes a set of pictures in some ways analogous to the TAT pic- 
tures. Goitein and Kutash (44) have published a report of the standard- 
ization on psychiatrically known populations of several unusual picture 
tests of projection. Leuba and Lucas (86) used a group of six pictures to 
investigate the effects on their subjects of three different moods—happy, 
critical, and anxious. They found that common sense and clinical insight 
are apparently correct in assigning to moods, feelings, and attitudes, a 
major role in the determination of intellectual processes. Raven (119) has 
experimented with a projective device on which a subject is confronted 
with a sketch of a person somewhat resembling himself and is asked a 
series of questions about what this hypothetical individual likes, is inter- 
ested in, is afraid of, is worried about, etc. Deri (31) has described the 
Szondi Test, which consists of photographs (representing eight different 
types of mental disease) among which a subject makes a selection on the 


basis of liking and disliking. The evidence of the diagnostic value of this 
test is not at all convincing. 


Play Technics 


Projective play technics have continued to be employed in published 
researches. Howard (70) administered a play interview technic to twenty- 
three kindergarten and twenty fourth-grade pupils and found that the 
amount and quality of fantasy material spontaneously given by the chil- - 
dren indicate that this is an effective technic for uncovering their attitudes 
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and interests. Bach (6) made an intensive analysis of the doll play fanta. 
sies of a group of young children, and discovered profound differences 
in type and amount of these fantasies to exist between boys and girls. He 
devised a clear-cut standardized procedure of eliciting the projective play 
fantasies of his young subjects, and by its use was able to qualify and 
classify their fantasy responses reliably and objectively. Pintler, Phillips, 
and Sears (110) attributed sex differences in the projective doll play of 
preschool children to a sex-typing process dependent on social learning 
during early childhood. Hay (58) studied the case of a persistently truant 
boy by means of projective play therapy. Sargent (138) utilized doll play 
with a nine-year-old boy who was presumably normal, and found him to 
be projecting his personal problems in the same way that so-called neurotic 
children do in a therapeutic session. She concluded that this supports the 
contention that children, of their own accord, play out their conflicts and 
problems. Henry and Henry (60) employed David Levy’s doll play technic 
with twenty-four children from a primitive Pilaga South American Indian 
tribe. They found sibling rivalry patterns very much like those found in 
our own culture. 

In addition to these uses of projective play technics with children, there 
were also a few reports, such as that by Renaud (120), of play projection 
employed with abnormal adults. 


Drawing and Painting Technics 


A good many reports have lately appeared in the literature dealing with 
drawing or painting as projective technics of personality measurement. 
Bender and Rapaport (15) collected the animal drawings of children over 
a number of years and reported that children who drew certain types of 
animals could often be placed in distinctive personality groupings. Thus, 
drawings of ferocious attacking animals were drawn by children with 
punitive fathers, who had inverted oedipus complexes. Buck (20) has 
experimented with the drawing of a house (H), tree (T), and person (P) 
as a projective device. Elkisch (37) subjected the drawings of eight chil- 
dren to a projective analysis, and found that the drawings of three whose 
sociometric ratings were high gave evidences of good adjustive ability, 
while the drawings of three whose ratings were low, gave projective evi- 
dence of maladjustment. One other child whose sociometric rating was 
low gave evidence of good adjustment in the drawings; and one whose 
rating was medium showed maladjustment. Hellersberg (59) brought out 
the Horn-Hellersberg projective drawing test, in which the subjects are 
given guide lines from which to make drawings. Taylor (160) analyzed 
the free drawings of American and Indian subjects, and reported indica- 
tions of the existence of cultural influences affecting behavior in the free 
drawing situation. Hurlock (71) studied the spontaneous drawing of ado- 
lescents and stated that these drawings reflect their interests, which are dif- 
ferentiated from the interests of younger children. Waehner (165), in a 
detailed investigation of spontaneous drawings and paintings of college 
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girls, noted that form-analysis of spontaneous drawings promises valuable 
findings in relation to understanding the inner dynamic of performance 
on the Rorschach, 

In the area of painting, Alschuler and Hattwick (4) explored easel 
painting as an index of personality in preschool children, and found that 
while the paintings themselves may not safely be used to predict behavior, 
they may give possible clues to understanding the child’s emotional flow 
and supply some of the missing clues needed to build a workable organ- 
ismic personality picture. Brick (19) published a paper on the mental 
hygiene value of children’s art work, in which she held that projective 
interpretation of children’s paintings provides valuable material for per- 
sonality study and for diagnosis of acute and deeper-seated problems. 
Naumberg (107), in a study of children’s art expression and war, found 
that repetitive and stereotyped art productions diminished as boys gained 
confidence in themselves and in their abilities to create. She also found, 
in her study of the art expression of a behavior problem boy (108), that 
the unconscious expression of his fantasy in free art work acted as an aid 
in both diagnosis and therapy. Arlow and Kadis (5) published a study 
of finger painting in the psychotherapy of children and noted that the way 
in which anxiety-producing fantasy reappeared and was elaborated ir. the 
finger painting of the children was most impressive. 

In the area of design, Diamond and Schmale (33) adapted the Lowen- 
feld mosaic test to projective interpretation, and discovered that the abil- 
ity of subjects to produce spontaneously an idea for a pattern, and to 
execute that idea within the limits of the test materials, utilizing both color 
and form to produce a recognizable gestalt, correlates with and reflects 
the personality integration of the tested individuals. 


Handwriting Technics 


Graphological projective technics apparently aroused little interest in the 
period under consideration. The most important study was one by Pascal 
(109). He experimented on twenty-two college men, and had them psycho- 
logically rated on thirty-six of Murray’s variables and on a good many 
handwriting variables. He reported that ten of the handwriting variables 
were shown to be significantly related to the personality variables, and 
contended that this conclusively established a significant relation between 
handwriting and personality. Considering, however, the small number of 
cases used, and the author’s not assigning any specific handwriting char- 
acteristic to a specified personality variable, his conclusions must be taken 
cautiously. 

Cooper (28) minced few words in censuring Eliasberg’s (36) paper on 
“political graphology” for “its benign assumption that it fits into ,the 
framework of scientific method” (28, p. 263). In view of the paucity of 
objectively sustained data set forth by Eliasberg, Cooper’s positiog in this 
connection is well taken. 
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Miscellaneous Technics 


In addition to the papers recently published on the usual types of pro. 
jective technics, several new procedures and applications have been brought 
out during the past three years. 

Several story or plot completion tests have been devised and presented 
in the recent literature. Wolfenstein (173) administered six stories, each 
with an alternative realistic and unrealistic ending, to psychotic, neurotic, 
and normal subjects. She found that the psychotics were mainly unreal- 
istic, while the latter two groups did not appear to differ significantly. 
Roody (124, 125) devised a plot completion test for purposes of analyzing 
a pupil’s attitudes toward fictitious situations and, by implication, toward 
his own life problems. She reported reliabilities of .835 and .914 for her 
test. A study by Billingslea (17) of the Bender-Gestalt should discourage 
the use of this test with neurotics; its value in the study of psychotics, 
however, especially where there is suspicion of brain damage, is still chal- 
lenging. Rotter (131) experimented with a simple method of scoring the 
sentence completion test, which yielded a self-reliability of .85 and an 
inter-scorer reliability of .89. As a measure of emotional stability it had 
a correlation of .61 with a psychologist’s ratings of 200 patients. 

Rohde (123) did some further work on the Rohde-Hildreth sentence 
completion technic, and found that correlations between ratings of 670 
high-school students’ responses in sentence completion items and the rat- 
ings of the combined judgments of their teachers, the experimenter, and 
other sources were .79 for the girls and .82 for the boys. 

Shor (147) reports the use of a sentence completion test which he calls 
the SIC (self-idea-completion) Test. He interprets this test by noting areas 
of refusal, resistance, and recurring atypical associations. 

Sargent (139) tried an experimental application of projective principles 
to a paper-and-pencil personality test. She presented a list of conflict situa- 
tions to college students and mental hospital patients, asking them to 
write, in any way they wished, on the subject, “What did he do and 
why?” and “How did he feel?” Sargent found certain significant differ- 
ences between papers written by mental patients and college students; and 
concluded that the results offered strong evidence that the mechanism of 
projection operates in a paper-and-pencil situation of the type used. 

Loeblowitz-Lennard and Riessman (89) propose a projective test of 
social attitudes consisting of true-false, multiple choice, and completion 
items on which the emphasis is shifted from the present to the past, from 
the personal to the impersonal, and from the organized to the ambiguous. 

Symonds (157) studied the autobiographies of teachers in terms of pro- 
jective principles, and specifically examined and discussed their need for 
autonomy, cognizance, and blamavoidance. 

Hall (51) has attempted to validate nocturnal dreams as expressions of 
personality by the methods of (a) social agreement, (b) internal agree- 
ment, (c) external agreement, (d) prediction, and (e) postdiction. 
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Summary 


The quantity and quality of the published material on projective technics 
for investigating personality have been sufficiently high during the past 
triennial period to warrant continued optimism concerning the growth 
and development of this lusty psychological youngster. It would certainly 
seem premature to celebrate the coming-of-age, or even the adolescence, 
of projective methods. Much remains to be accomplished in construction, 
evaluation, and standardization. Only the surface has been scratched in 
applications. But great interest in these projective technics, and a will to 
fight thru the problems and difficulties of a rapidly developing field, 
obviously exist among an increasing number of investigators. If that will 
persists, the way to maturity should not be too long. 
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CHAPTER VII 





Other Devices for Investigating Personality 


HAROLD H. ABELSON and ALBERT ELLIS 


‘Te present CHAPTER is devoted to character tests and to other technics 
of personality testing not covered in the previous chapters. 


Character Tests 


Research on character tests has suffered, in recent years, from the lack 
of extensive character-testing programs, If one digs deeply enough, how- 
ever, some recent literature on character testing may be unearthed. Notable 
among these studies is Jones’ (37) comprehensive review of character 
development in children. Jones concluded that, while each person seems 
to acquire his character in conformity with the usual laws of condition- 
ing and learning, “the possibilities for such acquisition and the broad 
limits thereto are provided by nature” (37, p. 747). 

The main experimental project in character testing and education now 
in progress is that being conducted by the Schenectady University-West- 
minster Character Research Project. The entire issue of Religious Educa- 
tion for November-December 1944 was devoted to the findings and criti- 
cisms of this program. Ligon (42), in particular, reported on the attitude 
scales and questionnaires employed in the study. Among the critics of the 
program, Tilton (71) was especially unenthusiastic about the measure- 
ment technics being employed, and questioned their practicality. 

Two new character tests have been recently reported on in the literature. 
Pauli (51) described a performance test used at the University of Munich 
that aims at disclosing character qualities which influence achievement. 
The test requires the subjects to do continuous addition for a sixty-minute 
period, and gives a “character quotient” for each individual. Pieron (53) 
devised an honesty-rating procedure whereby pupils are given back their 
own tests to mark, after these have already really been marked by their 
teachers; the pupils are rated on the number of changes made. 

Factor analysis of character measurements began to come into more 
widespread use in the period under consideration. Hsu (33) took Moore’s 
character questionnaire and found that sixteen general, and more or less 
primary and independent, traits could be obtained from its original fifty- 
seven headings. Keckeissen (39) then utilized Hsu’s sixteen trait list and 
was able to isolate two superfactors from it, which she termed “tendency 
to sadness” and “emotional stability.” Brogden (10), in a multiple-factor 
analysis of a different set of character measurements, emphasized the role 
of “situational” factors—i.e. factors limited mainly to a particular 
of situation as contrasted with factors of greater and more intrinsic gen- 
erality. These investigators used the term “character traits” in a broad 
sense, and dealt as much with personality as with character testing. 
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In the field of application, Bollinger (5), in his study of the social 
impact of the teacher on the pupil, used Wood’s Right Conduct Test on 
pupils in three different schools, and found that there was a close simi- 
larity among the three schools in the scores earned on it. Jones (38) 
studied the honesty of 304 subjects on five character tests given twelve 
years apart, and found a coefficient of contingency of .37 between their 
adolescent and adult honesty. 


Sociometric Methods 


In a previous issue of this Review Strang and Wollner (69) reported 
on a number of studies employing sociometric and allied technics. Addi- 
tional studies appearing since the period covered by their review are 
reported here. 

A test of twenty questions which directed the children in a group to 
choose companions and playmates for various social and other functions 
was described by Jastak (36). Mitchell (46) devised an interesting but 
subjective device that called for the selection of classmates conforming to 
popularly labelled and briefly described characteristics of various person- 
ality “types,” such as “Alibi Ike” and “Squirrel in the Cage.” More com. 
plex instruments were used by Ames (1), who modified Smalzried’s Social 
Acceptance Scale in such a manner as to yield information with regard 
to both social acceptance and awareness of acceptance status. Awareness 
of one’s social acceptance was. in the group studied, found to be limited. 
Jacobs (35) had seventeen girl employees express preferences concerning 
whom they would like to work with; he concluded that much valuable 
information was brought to light by this method, despite its low correla- 
tion with the findings of the Miller-Murray Personality Test. 

Factors associated with mutual friendships and non-mutual relationships 
were investigated by Potashin (55) and by Bonney (8), both of whom 
employed sociometric technics along with other measures depicting intel- 
lectual and social status. Potashin compared objective characteristics 
(height, academic achievement, residence) with social relations as revealed 
in sociometric choices. She also set up and recorded an experimental 
interview with pairs of friends and non-friends. Bonney selected two ex- 
treme groups of “very mutual” and “very unreciprocated” pairs of ele- 
mentary, secondary, and college students. In intelligence, in achievement. 
and in certain parts of the personality inventories which were applied, 
mutual friends were found to be no more alike than unreciprocative pairs. 

A number of other studies were found that dealt with the characteristics 
of socially successful and socially unsuccessful children (Bonney, 6 and 7; 
Northway, 49; and Kuhlen and Lee, 41). Frankel and Potashin (27) sur- 
veyed the literature on friendships and social acceptance among children. 

In a brief but penetrating analysis of two monographs on sociometric 
procedure (12, 48), Criswell (19) discussed critically the application of 
chance formulas to the interpretation of sociometric problems. Other studies 
of graphic, statistical, and other methodological problems related to socio- 
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metric procedures were made by Criswell (17, 18), Bronfenbrenner (11), 
and Moreno (47). The great bulk of investigations employing the socio- 
gram were reported in Sociometry. 


Checklists and Behavior-Rating Devices 


Checklists and behavior-rating devices designed to determine social ma- 
turity, self-help, or quality of behavior were developed or further studied 
from preschool to adult levels. On the basis of reports of nursery school 
teachers in her inservice seminars, Peller (52) prepared a useful checklist 
of significant symptoms in the behavior of young children. This list was 
prepared from the point of view of dynamic psychology rather than mere 
habit training. Patterson (50) reported the use of the Vineland Social 
Maturity Scale at the Fels Research Institute, where it was applied to a 
group of normal children six months to ten years of age. Patterson con- 
cluded that the Vineland Scale appears to be “a reliable and fairly valid 
measure of an aspect of development which, at the level studied, might be 
called independence (in self-help) , or self-sufficiency” (50, p. 286). A simi- 
lar study was made by Doll (22), whose sample, however, consisted of insti- 
tutionalized feebleminded subjects. Doll (21) also reported an exploratory 
study of the age-trend of social maturity ratings for both normal and 
feebleminded persons aged sixty-five years or over. Weitzman (75) con- 
structed a group test of social maturity, largely in multiple-choice form, 
that utilized self-reported information about the subject’s independence 
and responsibility of behavior in a variety of personal, social, and leisure 
activities. The test was designed for the age range of sixteen thru twenty- 
five. Shuey (66) studied the correlation between ratings of college students 
on the Wilke Personality Rating Scale by from four to fourteen college 
instructors; the correlation between average ratings of half the instructors 
with those of the other half ranged from .73 to .82 for the several traits, 
after application of the Spearman-Brown formula. Shuey concluded that, 
for adequate individual differentiation, approximately twenty raters would 
be required. 


Tests of Mental Ability Used as Personality Tests 


While the main function of a test of mental ability is usually to measure 
intelligence, such tests have recently been applied increasingly to the 
measurement of personality as well. Much of this usage stems from 
Wechsler’s (74) claims, in the third edition of his Measurement of Adult 
Intelligence, that the Wechsler-Bellevue Scales have an appreciable diag- 
nostic importance. According to Wechsler, such clinical groups as organic 
brain disease cases, psychotics, psychoneurotics, adolescent psychopaths, 
and mental defectives are characterized by differing performance or 
“scatter” on the verbal and performance scales of the Wechsler-Belleyue 
Test. The evidence on this point, while voluminous, is far from consistent, 
and sometimes leaves much to be desired in the way of appropriate ¢ontrol 
groups. The reader desiring an introduction to this literature should con- 
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sult the reviews by Brody (9), Mayman (45), Rabin (56, 57) and Watson 
(73). It appears that, at best, the various measures of “scatter” and score. 
pattern successfully differentiate groups, but are of comparatively uncertain 
value in the diagnosis of individuals. A special pattern of scores on the 
Wechsler-Bellevue Scales may be caused by several factors unrelated to 
psychopathy, so that caution must be exercised in using the scales diag. 
nostically. 

In the use of other tests of mental ability than the Wechsler-Bellevue 
for personality diagnosis, recent reports are less conflicting. Wallin and 
Hultsch (72) utilized the revised Stanford-Binet for such purposes and 
found it disappointing. On the other hand, practically all other recent re- 
ports in this connection are favorable. Brown (14) found tnat the revised 
Stanford-Binet could be employed by kindergarten teachers to throw 
valuable light on the personality make-up of young children. Sarason and 
Sarason (61), using the Kohs and revised Stanford-Binet Tests with cere- 
bral palsied defective children, observed distinctive test-score patterns. 
Porteus reported that the Q-score on the Porteus Maze Test “is a useful 
measure in the detection of the predelinquent and the potential criminal” 
(54, p. 103); Wright (77) also found the Porteus Maze Test to be useful 
in distinguishing delinquent boys from normals. Hunt and Older (34), 
using a brief screening test of intelligence and vocabulary in a sample of 
naval recruits, found scatter to be a valuable indicator of psychopathy. 
For an adult clinical sample of civilians, Rapaport, Gill, and Schafer (58) 
found scatter patterns to be diagnostic on the Wechsler-Bellevue Scales 
and on the Babcock Efficiency Test; and also found a sorting test of ab- 
straction to provide differential diagnostic indications. These authors have 
developed a more explicit and testable theory of the bases for patterns 
than have most previous writers, and have adduced considerable support- 
ing evidence. Their studies make a forward step in a difficult field. 

At least two other studies call for mention in this section. Eysenck (24), 
using a matrix test of intelligence, demonstrated that a normal group im- 
proved significantly more on retesting than did two neurotic groups. 
Goldstein (29) successfully employed the Army’s Visual Classification 
Test to develop a key for the detection of malingering on this test. 


Word-Association Tests 


The resurgence of interest in word-association tests probably arises from 
the relation of such tests to the increasingly popular “projective” technics. 
Studies of word association, however, continue to be confined almost ex- 
clusively to adult, “clinical” samples, rather than ordinary educational 
groups. One study of a normal sample is that by McIntosh (44) ; he scored 
a free association test for contrast responses only, finding that workers 
whose job is to influence other people have a distinct tendency to give 
more contrast responses than individuals who work alone. In the clinical 
field, several authors have found the results of association tests to be useful 
for diagnostic purposes: among others, the studies by Rapaport, Gill, and 
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Schafer (58, 62, 63), Tendler (70), and Schnack, Shakow, and Lively 
(64, 65) may be mentioned. In an interesting and original report, Welch, 
Diethelm, and Long (76) devised an association test composed of fifteen 
nonsense syllables of low association value. The test has some value as a 
| measure of “elation” in psychiatric patients, since patients in an “elated” 
4 condition give more associations to the nonsense syllables than do those in 
a “nonelated” or “normal” state. 


Miscellaneous Tests of Personality 


Tests of personality need not always be of a verbal nature; and, as usual, 
several reports appeared during the past three years dealing with visual, 
motor, or performance devices for the investigation of personality. Simon 
(67) published a review of Mira’s form-tracing test, which he found to 
be a convenient, rapid method for detecting psychopathic personalities, as 
well as potential leaders among normal individuals. Koppe (40), in her 
psychosomatic study of fifty stuttering children, used the Ozertzky motor 
examination, and reported that it revealed marked disturbances in their 
motor functions. Brower (13), administering a modification of the Snoddy 
mirror-drawing test to forty-eight college students, found that the visuo- 
motor conflict induced by this test tapped certain clearly differentiated 
facets of personality. Louttit (43) used a mirror-tracing test on eighty-six 
problem men among naval personnel and on eighty-six normals, and found 
that it significantly differentiated the two groups. Yacorzynski and Ney- 
es mann (78) gave a figure completion test involving visual and motor com- 
ponents to forty controls, thirty manic-depressives, and thirty schizo- 


ve phrenics. They found that all the differences which the manic-depressives 
ns displayed from the controls and the schizophrenics in completing the 
rt figures could be accounted for by an increase in their motor activity. 


Level of aspiration and stress tests also continued to be used for person- 
ality evaluation. Gruen (30) used a level of aspiration test, with a short- 
hand task and symbol substitutions, in a personality study of factors in 
) _ adolescence. Schnack, Shakow, and Lively (64) employed an aspiration 
on level test in their studies of insulin and metrazol therapy. Hanawalt, Hamil- 
ton, and Morris (31) administered Frank’s simple letter substitution test 
to college leaders and non-leaders, and found that the average level of 
aspiration of the former was significantly higher than that of the latter. 
ym Ego-involvement in relation to levels of aspiration was both discussed 
cs. and investigated by Holt (32). Freeman (28) published detailed sugges- 
tions for a standardized “stress” test to be used for experimental purposes. 
val In an attempt to study personality correlates of identification with 
ed feminine role on the part of college girls in a course in psychology, Franck 
(26) prepared nine pairs of pictures representing sex symbols in the guise 


ive of art products. The subjects were asked to select the most “attractive” of 
cal each pair of pictures, and their selections were correlated with question- . 
ful naire data on feminine attitudes. Franck concluded that “girls preferring 
nd male symbols were more mature, i.e., accepted their role as women and 
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accepted men as their counterpart, while girls preferring female symbols 
were less mature” (26, p. 117). Steinmetz (68) gave a new twist to the 
use of personality inventories in an attempt to measure psychological un- 
derstanding by having two persons manifesting interpersonal difficulties 
indicate how each other would respond to the questions of an inventory. 
Bennett (4) administered a test consisting of a list of sixty common 
annoyances to neurotic and non-neurotic hospital patients; and found 
that the test as a whole would have been unsuccessful in determining neu- 
roticism in 30 percent of the patients tested. Eysenck (23) reviewed 
methods of measuring appreciation of humor and determined the inter- 
correlations of the reactions of men and women to a variety of materials 
of an ostensibly humorous character. A second edition of Roback’s Sense 
of Humor Test (60) was published. 

Finally, miscellaneous tests of personality, ranging from the most sub- 
jective to “objective” extremes, continued to be devised and experimentally 
employed. Anthony (2) asked Canadian school children, aged eleven and 
twelve, merely to rank their preference for such words as house, obey, 
apple, song, and dead; and found that, as a test of social adjustment, word- 
ranking is a method of promising validity. Roach (59), working with 
college students, experimented with a test of the “plodding” type of per- 
sistence; his battery included both performance and verbal subtests. 
Cattell (16) continued his work with his miniature situations tests, which 
he described as an “objective test of: character-temperament.” Buck (15) 
invented a “philophobe” test, which consisted essentially of a controlled 
interview, with the test questions to be asked orally by the examiner. 
Curtis and Thorne (20) also published a rapid evaluation technic which 
is in the controlled interview form. Eysenck (25) utilized both a dark 
vision and a suggestibility test for the screening of army neurotics from 
normal soldiers. Atterbury (3) adapted the psychodrama for diagnostic 
testing purposes, and reported that the first results of experimenting in 
this direction appeared promising. 

Finally, personality is a complex, multi-faceted affair; and the instru- 
ments for the testing of personality are correspondingly diverse and 
numerous. Research in this field must continue to be as broad and as deep 
as it possibly can be. A not inconsiderable degree of progress in per- 
sonality testing has been made in the last three years; but more still 
remains to be achieved by future workers in this field. 
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CHAPTER VIII 


Statistical Methods Related to Test 
Construction and Evaluation 


ROBERT M. W. TRAVERS 


Tis review is a continuation of the survey made by Conrad (18) 
in the February 1944 issue of the Review. Additional surveys since 
that time include a review by Blommers (6) of recent developments in 
statistics, and a survey of new computational technics by Lorge (73). 

The studies reviewed in this chapter were first located by a search 
of Psychological Abstracts, The Education Index, The Cumulative Book 
Index, and the “Statistical Methodology Index” by Buros (9) which ap- 
pears quarterly. 


Texts 


Most of the texts published in the period under review were, as usual, 
of greater expository than research interest. Of the various texts, the 
first volume of a proposed two-volume work by Kendall (68) requires 
special mention. A book by Mather (79) presents a number of mathe. 
matical methods which have been used to date largely by geneticists, but 
which have many possible applications in educational research. 


Relations between the Characteristics 
of Items and of Tests . 


During the period covered by the present review, a number of papers 
appeared evaluating present technics for the selection of test items. 
Some of the most original of these contributions were based on the analogy 
between the traditional technics of psychophysics and current technics 
of measurement in aptitude and achievement tests. Finney (32), a 
statistician working in the field of agriculture, became acquainted with 
a paper published by Ferguson (27) in which it had been pointed out that 
the Miiller-Urban constant process might be applied to the problem of 
item selection. Ferguson showed that each test item may be described 
first in terms of a limen, which is a measure of the point at which the item 
discriminates, and second, in terms of the standard deviation of the 
limen which is a measure of the goodness of the discrimination. Finney 
observed that the problem discussed by Ferguson was essentially the same 
as that which toxicologists encounter in the treatment of the data derived 
from their experiments. In the case of a test item the subject can respond 
either correctly or incorrectly, while in experimental toxicology the sub- 
ject responds either by dying or by living. Finney showed how the 
methods used by the toxicologists supply a maximum likelihood solution 
to the problem of obtaining the best estimate of the limen and the stand- 
ard deviation of the limen. The method of the toxicologist, the probit 
analysis method, has the advantage of providing an estimate of the stand- 
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ard error of the standard deviation of the limen. The probit analysis 
tables used in estimating the limen and the standard deviation of the 
limen are published by Fisher and Yates (34). In evaluating the probit 
analysis method it may be said that the method is laborious but has a 
sounder mathematical basis than most of the methods of item analysis 
which are now most widely used. 

Lorr (74) also published a study which started out from the analogy 
between the methods of psychophysics and the methods of measurement 
used in aptitude and achievement tests. Lorr, however, was not so much 
concerned with the selection of items as with the problem of scoring the 
test as a whole in terms of a limen. Lorr showed that an amount-limit 
test, which is homogeneous with respect to content but carefully graded 
with respect to difficulty, presents a situation similar to that which occurs 
when a threshold is measured by the constant method. This is a reitera- 
tion of the well-known fact that if test items are carefully graded with 
respect to difficulty then, under ideal circumstances, a subject would 
answer the items correctly up to a certain point and answer them in- 
correctly beyond that point. In terms of traditional psychophysical con- 
cepts, the individual could be given a score in terms of the threshold 
where rights change to wrongs. Since various factors obscure the precise 
threshold point, various mathematical devices have been devised for 
estimating the threshold. The threshold may be estimated by determining 
the raw score, by making an estimate of the point where rights change to 
wrongs, or by calculating a dispersion parameter from an ogive curve 
fitted to the data. Lorr showed that the first two methods provided 
measures which agree well with each other and are of approximately equal 
reliability. However, the dispersion parameter was found to have little 
relationship to the other two. This latter conclusion is particularly 
important because of the common practice of measuring thresholds in 
terms of dispersion parameters. 

While the discussion of the relation of the item to the total test from 
the standpoint of psychophysics leads to very complex methods of item 
analysis if rigorous mathematical procedures are followed, practical 
considerations usually make such a procedure inadvisable because of the 
limited value of the data upon which an item analysis is based. Conse- 
quently, many workers in the field of test construction will be more 
concerned with minor revisions of common item analysis procedures which 
yield approximations. Davis (24) published a refinement of the well- 
known Flanagan chart in which the difficulty indices are directly com- 
parable regardless of the number of alternative choices in the item, and 
in which the discrimination indices are not estimates of a Pearson cor- 
relation coefficient but functions of Fisher’s z. These functions of z 
have the advantage of being subject to errors of measurement which are 
independent of the numerical Yalues of the indices. While Davis’ refine- 
ment of the Flanagan chart may be useful, it should be borne in mind that 
the indices of discrimination are still only estimates of the discriminating 


lil 














Review oF EpucaTIONAL RESEARCH Vol. XVII, No. ] 





power of the items, being based on approximately half of the available 
data. 

A simple and practical graphic form of item analysis was presented by 
Turnbull (102). Turnbull’s normalized graphic method provides informa. 
tion about each option and reveals non-rectilinear relationships between 
item and criterion when such are present. However, Turnbull points out 
that his method is valuable mainly for revising items, and should not be 
used when the added time required to secure detailed information is a 
major consideration. The method makes it possible to estimate visually the 
parameters which Ferguson (27) and Finney (32) were concerned with 
estimating with greater mathematical precision. 

Gulliksen (38) and Tucker (101) published studies on the relation- 
ship between item difficulties, item validities, and the total reliability of 
a test. Gulliksen discussed the relation between item difficulty and inter- 
item correlations on the one hand and total test variance and reliability 
on the other. In this paper, Gulliksen pointed out that, contrary to common 
belief, there is no logical reason for making the variance of a test a 
maximum and that for practical purposes the variance of the scores on a 
test should be an optimum rather than a maximum. Tucker discussed the 
relation between true score and test score and showed that, in a 100-item 
test, there is a maximum correlation between true score and test score when 
the point correlation between items is slightly greater than 0.2. However, in 
this paper, as in many others that discuss the relationship between item 
intercorrelations and test reliability, the conclusions are based on assump- 
tions which cannot be made when an actual test is involved. For example, 
in the case of Tucker’s study, one assumption made is that the interitem 
correlations are all of the same magnitude. Since such a condition is never 
likely to occur in practice, considerable caution should be used in applying 
the conclusions drawn from the study. In this connection, it may be noted 
that Carroll (13) has made a careful study of the effect of difficulty and 
chance success on correlations between items and between tests, and 
Wherry and Gaylord (109) examined the relation between the factor 
pattern of test items and intertest correlations. 


Reliability and Validity of the Test as a Whole 


Guttman (43, 44) and Wherry and Gaylord (108) made analyses of 
the concept of test reliability. Guttman (43) made an analysis of the 
sources of errors in test scores and divided them into three categories, 
namely, trials, persons, and items. He then defined test unreliability in 
terms of the variations from one trial to another and showed that while 
unreliability so defined could not be estimated from a single trial, yet 
a lower limit could be established for it on that basis. Guttman (44) 
also applied the same analytic procedure to the estimation of the relia- 
bility of qualitative data. He showed that when a single qualitative item 
is tried out on a single occasion with a large population, it is possible 
to calculate a lower bound to the group reliability coefficient. From two 
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experimentally independent trials it is possible to calculate also an upper 
bound to the group reliability coefficient. This paper by Guttman described 
simple methods of calculating the lower and the upper bounds of the 
reliability. 

Wherry and Gaylord pointed out that most methods of estimating 
reliability of a test assume that a single factor runs thruout all the 
tests items. They showed, on the basis of this assumption, that a test 
composed of K factors would have its reliability erroneously estimated 
with a ratio of (n-K)/(n-1) to the true estimate. Consequently, if a 
test can be broken down into subtests, the Kuder-Richardson formula 
should not be used on the test as a whole. In such a situation the Kuder- 
Richardson formula should be applied to each subtest and then, by using 
the subtest reliability coefficients together with the intersubtest correla- 
tions, an estimate of over-all reliability can be computed. In the development 
of such a series of subtests it would be necessary to evaluate item validities 
in terms of the correlation of the item with the scores on the subtest and 
with the scores on the whole test. While the procedure suggested by 
Wherry and Gaylord is logically sound, there are relatively few situations 
in which it would be applicable, since most authors of tests are concerned 
either with the reliability of the individual subtests in their battery, or 
with total reliability when the material is too homogeneous to permit 
grouping in subtests. 

Other contributions to the problem of estimating reliability include a 
paper by Kaitz (65) who developed, on the basis of the analysis of 
variance, a formula for the internal reliability of a test. Cronbach (19) 
proposed that tests be so devised that the odd items and the even items 
form strictly comparable groups with respect to content, form, difficulty, 
and range of difficulty, so that test reliability might be more adequately 
determined. Davis (22) described a method for determining the reliability 
coefficient of a test over a given range of ability when the reliability over 
a more limited or a more extended range is known. Kaitz (66) presented 
a discussion of the Davis paper, and additional comments were made by 
Martins (78). 

Brogden (7), Burt (10), and Thomson (90) pointed out that tests 
are frequently uséd on populations that are either more restricted or 
less stricted than those used in the validation studies. Thomson dis- 
cussed one aspect of this problem in a paper which outlines the necessary 
and sufficient conditions for using the Karl Pearson formulas for estimat- 
ing the actual correlation between test results and subsequent school per- 
formance when only the tail end of the distribution is used, as happens, 
for example, when elementary-school children have been tested, but 
only those who enter secondary schools are included in subsequent studies. 
Both Brogden and Burt presented solutions to the converse problem of 
estimating the correlation between a predictor and a criterion in an.wn- 
selected population when the correlation with a selected population is 
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Both McNemar (76) and Meehl (80) gave simple presentations of the 
problem of suppressor variables. Meehl pointed out that the selection of 
potential suppressor variables is primarily a psychological rather than a 
statistical problem. Meehl urged that a search for suppressor variables 
be made, tho it must be admitted that there is a lack of studies at the 
present time in which suppressor variables have played a major role. 

Mosier (81) developed a method of estimating the reliability of a 
composite score from a knowledge of the weights of the components, and 
their dispersions and intercorrelations. According to Mosier, in order to 
achieve maximum reliability of a composite, it is necessary to give each 
component a weight proportional to the sum of the intercorrelations with 
the remaining components and inversely proportional to its error variance. 
The method is useful when, thru lack of a measurable external criterion, 
it is impossible to weight components directly for maximum validity. 

Richardson (86) pointed out that the correlation coefficient has little 
meaning to the lay public as a measure of the predictive efficiency of a 
test. He attempted to develop a formula which would measure predictive 
efficiency directly in terms of the total effectiveness of the men selected. 
The main weakness of the suggested formula lies in the fact that it 
requires the use of an estimate of the ratio of the average effectiveness of 
the men selected to the average effectiveness of the men not selected by the 
test. The estimate of this ratio may be difficult to obtain and highly unreli- 
able. Brogden (8) discussed the same problem from a more technical aspect 
and showed that when two variables, a predictor and a criterion, have 
similar frequency distributions and when the regression of the predictor 
on the criterion is linear, then r (rather than 1—\/1—r?) may be con- 
sidered to be a direct index of predictive efficiency. 

Sandon (87) discussed the use of the analysis of variance for estimating 
the reliability of tests in large-scale programs where the scores on the tests 
are not objective and where the unreliability of the scoring procedure 
must also be estimated. 


Factor Analysis 

During the period covered by the present issue of the Review a very 
large number of papers have been published on problems of factor analysis. 
However, most of these papers deal with minor refinements of arithmetical 
procedure or with short-cut methods. Few of the contributions could be 
described as presenting fundamental developments in the field, and in 
some quarters there are whispers of dissatisfaction about the large amount 
of energy which is being expended on mathematical details. In this connec- 
tion, Ferguson (28), in reviewing the applications of mathematical tech- 
nics to psychological problems, stated with reference to factor analysis 
that the present tendency is for psychometrics to become a branch of 
statistical mathematics as such, rather than a branch of psychology. 

One of the few general theoretical discussions of factor theory during 
the period was published by Guttman (42). This paper included an at- 
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tempt to give a rigorous justification of Thurstone’s Centroid Method on 
the basis of a technic developed by Lagrange for reducing bilinear and 
quadratic forms. Guttman showed that Lagrange’s theorem proves that 
the centroid method does actually reduce the rank of the Gramian matrix 
by unity at each stage. The paper also discusses the direct factoring of the 
test score matrix. 

Other contributions of general theoretical interest were made by Thur- 
stone (95) and Holzinger (49, 50). Thurstone discussed the effects of the 
selection of tests on the outcomes of a factor analysis, pointing out that 
if a given test battery shows a simple structure, then the addition of tests 
which are linear combinations of those in the battery will not affect the 
structure. Holzinger (50) demonstrated the equivalence of the centroid 
method and Spearman’s method of factor analysis if the Spearman 
formulas are extended to include the communalities in the diagonal of the 
correlation matrix. In a separate paper, Holzinger (49) described a gen- 
eral procedure for obtaining a complete factor analysis of scores both 
when orthogonal and oblique factors are considered. One of the main 
objects of this latter paper of Holzinger was to point out the shortcomings 
of certain elementary statistical procedures. For example, the. simple 
average, he states, is adequate only when the data are of rank one; 
that is to say, when only one factor is involved. To employ a simple 
average for summarizing data of higher rank is to summarize the data 
inadequately. Davis (23) developed a method for determining the relia- 
bility of each of the components resulting from a factor analysis by the 
principal axis method. While the method is not completely rigorous it may 
prove useful. 

Most of the short-cut methods of factor analysis reduce the length of the 
mathematical procedures by introducing subjective judgment in place of 
rigorous calculation. For example, Thurstone (92) described a graphical 
method of factoring a correlation matrix. As with all graphical methods, 
subjective judgment plays an important part in the procedure. 

Tucker (99) described a compromise between the graphical short- 
cut methods of factor analysis and the routine application of analytic 
methods. In Tucker’s method, the axes for the subgroups of tests are 
located by analytic methods, but graphic data concerning the inter- 
relation of factors are used in the selection of subgroups. Tucker (100) 
also developed a procedure for determining the successive principal 
components of a correlation matrix without the necessity of computing 
the successive tables of residual correlations. 

Both Thurstone (94) and Holzinger (51) described short-cut methods 
of factor analysis which depend upon the grouping of tests within the 
battery according to their intercorrelations. In the Holzinger method, 
the correlation matrix is sectioned into portions, each of which has a 
rank of approximately unity; centroid coefficients for the variables in 
each section are then computed. This procedure is possible when the cor- 
relation matrix presents a structure such as a bi-factor pattern. 
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Additional short-cut methods were described by Thurstone (97), 
Zimmerman (110), and Carlson (12). Thurstone (97) described a 
method of rotating the axes which can be handled by a clerk. Zimmerman 
(110) described a simple apparatus for facilitating the graphical procedure 
for making orthogonal rotations of axes. Carlson (12) described a simple 
approximation procedure for factor analysis. It involves no inversion of 
signs of negative residuals, the estimations of as few communalities as 
there are factors, and relatively little work in the rotation of axes for 
arriving at meaningful results. However, this method like all other simpli- 
fied factorial methods involves assumptions which do not have to be made 
when the longer and more orthodox procedures are used. Carlson used 
hypothetical data to show empirically that the results produced by his 
method are comparable to those derived by the centroid method. Here, as 
elsewhere, it must be pointed out that empirical justification based on a 
single instance is only circumstantial evidence in favor of a particular 
technic. 

Davis (21) made a factorial analysis of nine tests of reading skills 
based on the classification by experts of a group of items. Davis made a 
principal axis solution which extracted as many factors as there were 
tests. Thurstone (98) criticized Davis’ procedure on the basis that Davis 
had obscured the underlying factors by imposing weights on the test 
scores according to the judgments of authorities. Thurstone made a re- 
analysis of data, first by using Spearman’s uni-dimensional method and 
then by using the centroid method, and arrived at a single factor solution 
with very small residuals indeed. Thurstone concluded that Davis’ nine 
tests failed to identify the components of the complex that we call reading 
ability. 

Lawley (71) discussed the problem of the outcomes of a factor analysis 
of a set of tests of unequal difficulty. He showed that a spurious factor 
may be introduced as a result of the differences in difficulty. 

Finally, mention myst be made of the following: a paper by Ullman 
(104), who suggested a time-saving modification of the iterative method 
of matrix inversion; discussions by Holzinger (52) and Thurstone (93) 
of second-order factors, in which the latter suggested that second-order 
factors may be of value in reconciling the various theories of intelli- 
gence; a paper by Kelley (67) which presented a variance-ratio test of 
the uniqueness of the principal-axis components as they exist at any 
stage of the Kelley iterative process; and a paper by Cattell (14) on 
cluster analysis. In this latter paper, Cattell compared the four main 
methods of determining the clusters in a correlation matrix, and the 
relative utility of factor analysis and cluster analysis. According to Cattell, 
cluster analysis may be used profitably as a first reduction of variables, in 
order to provide a brief list upon which a factor analysis may be 
made. In a later paper, Cattell (15) discussed methods of determining 
the choice of factors of rotation. 
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Tests of Significance 


Papers devoted to tests of significance cover a wide range, from dis- 
cussion of the philosophy underlying the use of such tests to the publica- 
tion of tables and nomographs for use in specific situations. Simon (88) 
discussed the problem which arises when two experimental treatments 
produce statistically insignificant differences but the experimenter has to 
select one of the treatments. The problem is analogous to that which 
occurs when it is found that two tests differ insignificantly in predicting 
a criterion, but one of the tests must be selected. 


A number of papers by Festinger (29, 30, 31) developed a series of 
tests of the significance of the difference between the means of two 
samples for cases where it is not reasonable to assume that the samples 
are derived from a normally distributed population. In the last of these 
three papers (31), the extreme case is discussed where it is desired to 
test the significance of the difference of the means of two samples but 
where it is impossible to postulate the distribution function of the parent 
population. In such cases, the computed level of significance will be much 
lower than when it is possible to postulate a definite distribution of the 
parent population. 


Johnson and Tsao (63) presented a problem which had arisen in 
connection with some data on the height of boys and girls divided 
according to age group, in which a difficulty was encountered in making 
comparisons of the variability of various subgroups since both the mean 
height and the variability increase with age. In order to secure a valid 
comparison cf the variabilities, it was necessary to make an allowance 
for the differences in the means of the groups. Johnson and Tsao developed 
a technic for testing the hypothesis of equality of standard deviations 
after adjustment for the inequality of means. 


Barnard (2) described a test for homogeneity in a fourfold table in 
which only one set of marginal totals is fixed. Under these conditions 
the level of significance of an apparent departure from homogeneity is 
reduced. Fisher (33) criticized Barnard’s test of significance and ques- 
tioned its utility. In a later paper, Barnard discussed his test in relation 
to the problem of determining sample size. Gumbel (40) made an analysis 
of the sources of inaccuracy in the use of the chi-square test. In particular, 
the test is influenced by the choice of class interval and the practice of 
combining several cells at the extremes. Vajda (105) made a comprehen- 
sive survey of the measures of significance provided by chi-square for 
the main effects and interactions in contingency tables in which there are 
multiple interactions. Bancroft (1) discussed the biases in estimation that 
result from the use of preliminary tests of significance. 

Tables and graphs designed to assist ih the application of tests of 
significance were published by Thornton (91), Bliss (5), Fiske and 
Dunlap (35), Hayes (47), Lord (72), and Fisher and Yates (34). 
Thornton (91) published tables of coefficients of rank-difference correla- 
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tions that are barely significant at six different levels of significance. The 
levels of significance are provided for values of N from 2 up to 30. Thorn. 
ton pointed out the difficulty of evaluating the significance of rank. 
difference correlations when they involve tie rankings. Kendall (69) 
provided a detailed discussion of the problems of tie rankings. 

Hayes (47) pointed out that the computation of the standard error of 
tetrachoric correlation coefficients has been a laborious task in the past, 
and presented tables to assist in this procedure. Bliss (5) published a 
table of the chi-square distribution for degrees of freedom 1 to 30, values 
of p from 0.1 to 0.001, and values of chi-square from 0 to 60. Fiske and 
Dunlap (35) described a method of developing a graphical test of the 
significance of the difference between pairs of frequencies. The null 
hypothesis tested is that both samples are derived from the same popula- 
tion, and that the best estimate of the population parameter is the weighted 
mean proportion of the two samples. Lord (72) developed an alignment 
chart for calculating the fourfold correlation coefficient. 


Short-Cut Methods of Treating Quantitative Data 


A detailed review of recent developments in computational technics was 
prepared by Lorge (73). Short-cut methods of factorial analysis have 
already been discussed in the section of this chapter on factor analysis. 
Of special interest in connection with the present review are the short-cut 
methods which have been developed to facilitate the handling of prob- 
lems of prediction from test scores. Jenkins (56) published a simple 
method for estimating a product-moment r. Beall (4), Butsch (11), 
Naylor (82), and Dwyer (26) discussed methods of facilitating multi- 
variate analysis. Beall (4) discussed various ways of estimating the solu- 
tions of the equations necessary for solving discriminant-function prob- 
lems of the type described by Fisher. Beall showed empirically that his 
estimates yielded results very close to those arrived at by the usual 
lengthy computational procedures. However, this empirical finding does 
not justify the use of such short-cut methods in all instances. It is quite 
possible that, in the data examined, a simple linear summation of scores 
might have resulted in almost as effective discrimination between the 
groups as a more elaborate and more precise technic. Here, as in other 
areas, generalization should not be made from a single instance. 

Dwyer (26) developed a method of calculating multiple correlations 
which was essentially a variation of the Doolittle method. Naylor (82) 
developed a method of estimating multiple correlations between a criterion 
variable and more than two variables. The method involves the use of 
stereographic projection and combines a graphical procedure with cer- 
tain approximate equivalences similar to those used by Kelley for a similar 
purpose. Butsch (11) developed a worksheet for the Johnson-Neyman 
technic for determining the significance of the difference between two 
groups of individuals on one variable, when two other variables are held 
constant by statistical methods. 
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Guttman and Cohen (45) showed that, if a battery of tests is resolved 

into r orthogonal common factors and n unique factors, then a series 

of multiple regressions can be calculated from the factor loadings. For 

example, it is possible to compute from the factor loadings, the regression 

of any one test on the remaining tests, or the regression of any one factor 

on the n tests. The computational procedures described might save a con- 

siderable amount of time under certain conditions. 

Krathwohl (70) developed a simple graphical method whereby the 

achievement of different classes of students may be compared while tak- 

ing into account differences in ability between the classes. Krathwohl 

suggested that his method might be used to evaluate the teaching ability 

of different instructors in an institution. However, since such data seldom, 

in practice, demonstrate differences between instructors, and since no 

test of the significance of differences is supplied, there is danger that 

Krathwohl’s method might be misused by those who have little acquaint- 

ance with statistical problems. This danger is particularly acute since the 

method is presented for the use of those who cannot handle more complex 

technics. 

Norton (84) developed a method of successive approximation for find- 

ing the departure from expectation in a complex contingency table of the 

type 2” x R for the purpose of calculating chi-square. 

Tables and nomographs which may be of value to the psychometrician 

have been published by Fisher and Yates (34), Jackson and Phillips (53), 
Jurgensen (64), Swineford (89), and Crow (20). 

In many problems of psychology and particularly those of market 
research, much labor can be saved if the experimenter can determine in 
advance the size of the sample to be examined. Swineford (89) developed 
tables which are useful in this connection. Nordin (83) also provided a dis- 
cussion of the problem of sample size. 

The tables developed by Jackson and Phillips (53) are for use in 
predicting success or failure from various deciles of a predictor variable. 
The two-way frequency tables are reported in decile units and show the 
expected frequencies in each cell for correlations from 0.30 to 0.95 in- 
clusive. The tables also show the percent of successful and unsuccessful 
individuals in each decile for each value of r, and for failure ratios from 
20 to 80 percent. 

Additional contributions to the area of computational aids include a 
paper by Waugh and Dwyer (106) on the extension of compact methods 
of computing the inverse of a matrix to cases where the matrix is non- 
symmetrical; a discussion by Hall, Welker, and Crawford (46) on the 
use of tabulating machines for the extraction of factors; and a technic 
developed by Grossman (37) for weighting individual responses on the 
1.B.M. without the necessity of scoring the papers more than once. Finally, 
one of the oldest and simplest short-cut technics, namely the grouping of 
data, was carefully scrutinized by Jarrett (54), who concluded that the. 
size of the class interval should depend on the ratio of the standard error 
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resulting from the grouping to the standard error of random sampling. 


McNamara and Weitzman (75) showed that item analyses made by 
the 1.B.M. Graphic Item Counter are more accurate than those made by 
hand and take only one-eighth the time. A second contribution on the 
use of the scoring machine was made by Herfindahl (48), who discussed 
several methods of reading standard scores directly from the scoring 
machine. 


Applications of the Analysis of Variance 


There are still relatively few papers published in the field of psychologi- 
cal measurement in which the analysis of variance is used as a tool to 
test a psychological or an educational hypothesis, or in which an experi- 
mental design is selected with a view to testing hypotheses by the analysis 
of variance. Sandon’s paper (87) has already been reviewed in this con- 
nection. Of the five papers reviewed in this section, two are primarily 
concerned with experimental design, while the other three are mainly 
concerned with the treatment of results. 


Johnson and Tsao (61) demonstrated how proper factorial design 
might improve the precision of a psychophysical experiment and enable 
the experimenter to determine both the effect of a number of factors and 
the effect of their interactions. The particular experiment selected for the 
purpose of the demonstration was that of determining the differential limen 
of subjects for weight increasing at a constant rate. The factorial design 
was of the type: 4 rates of increase x 7 weight levels x 2 sexes x 2 sights 
(blind or seeing) x 2 dates. The design provided considerably greater 
precision than the traditional design of psychophysical experiments. In a 
second paper by these same authors (62) an additional example was 
provided of the use of factorial design and of the analysis of variance in 
the treatment of test data. 

Two papers by Cochran (16, 17) are of considerable importance in 
the present connection. In one of these papers (17), Cochran discussed the 
use of multivariate analysis for determining equivalence, linear relation- 
ship, and the relative accuracy or sensitivity of two or more scales used 
in the same experiment. In the second paper, Cochran (16) discussed the 
problem of weighting percents to take into account unequal numbers, and 
various criteria were suggested for deciding whether to use binomial. 
partial, or equal weighting procedures. Once a system of weighting had 
been established, Cochran provided criteria for determining whether a 
system of angular weighting is desirable. It may be noted that angular 
transformations have been used in the past as part of the analysis of vari- 
ance when the data have been given in terms of percents. 

The methods of analysis of variance and covariance have been de- 
veloped mainly in areas in which the data are composed of equal or 
proportionate numbers of observations in subclasses. However, in educa- 
tion and psychology it is common to find unequal representation in the 
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subclasses of a multiple classification table. Under such conditions of 
disproportionate representation it is necessary to apply special mathe- 
matical methods if the analysis of variance is used. Tsao (103) described 
ways of handling such problems and also suggested approximate solutions. 


Treatment of Qualitative Data 


Guttman (41) and Wherry (107) published papers on the quantitative 
treatment of qualitative data. Guttman discussed the quantitative treat- 
ment of data of the type derived from public opinion polls. He pointed out 
that qualitative items may be scaled provided that they have “sameness” 
of content and that each item behaves as a simple function of scores 
derived from the distribution of the items. The latter condition had been 
fully discussed in a paper by Goodenough (36), who attempted to 
develop a technic for determining whether that condition could be 
assumed to exist. 

Gulliksen (39) also discussed scaling procedures and made a careful 
examination of scales constructed by the classical method of paired com- 
parisons. He showed that scales constructed by this method satisfy the 
broader definition of measurement, and have valuable properties not 
possessed by ordinal scales or by rank order scales. 

McNemar (77) surveyed the current methodology of the measure- 
ment of opinions and attitudes. This survey included a critical discus- 
sion of scaling technics and of the merits of measuring opinion by means 
of a seale rather than by the single question. McNemar deplored the 
lack of work on the reliability and validity of measures of attitude. — 
The paper by Wherry (107) described a technic for weighting qualita- 
tive data, such as biographical material derived from questionnaires, 
for predicting success on an independent criterion. The method is pre- 
sented with a few cautions, such as the need for cross validation (i. e., 
validation in a new sample). However, the chief danger inherent in such 
statistical technics is not mentioned, namely, that they are becoming 
widely used as the basis for a crudely empirical approach to psychologi- 
cal problems in place of the rational method of science. 

Thurstone (96) presented a discussion of the central concept of meas- 
urement in situations such as the public opinion poll, the prediction of 
political elections, or the prediction of consumer choices. His central 
theorem is that if the average liking or disliking for three or more 
psychological objects is the same for each object, then the object for which 
there is the greatest range of liking and disliking is the one which will 
receive the largest number of first-choice votes. This concept, which 
Thurstone refers to as “discriminal dispersion,” has important implications 
not only for market research but also for measurement of social attitudes. 


Measures of Correlation 


Johnson (57) pointed out the rather obvious but often neglected fact: 
that errors of measurement may increase as well as reduce an estimate of 
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correlation, and that consequently such errors may result in an estimate 
of r which is numerically greater than 1.00 after the correction for attenua- 
tion has been made. 

Both Jaspen (55) and Burt (10) described a coefficient of correlation 
which could be used when the criterion yields a threefold classification, 
Jaspen, however, extended his procedure so that it could be used with 
data classified into four or more categories. 

Johnson (58, 59, 60) discussed the merits of using multiple contingency 
technics instead of multiple correlation technics for predicting a criterion 
from a number of variables. Johnson (58) enumerated the criteria that 
could be used in dividing a continuous variable into a dichotomy. 
While Johnson argued that the main advantages of multiple contingency 
technics are economy of time and the ease of obtaining results for inspec- 
tion, it is doubtful whether these advantages fully compensate for their 
statistical inefficiency. 

Considerable attention has been devoted in recent years to devising 
a statistic which would indicate the presence of a relation between suc- 
cessive observations. Such a statistic is widely needed in economics and 
could be of value in educational research. Dixon (25) developed ratio 
functions for testing hypotheses related to such problems of serial corre- 
lations. 

Peters (85) developed a new descriptive statistic which is related to 
the second-order parabola in much the same way as the correlation 
coefficient is related to the regression coefficient. The statistic describes 
the general trend of the regression and the nature of its curvilinearity. 
However, Peters notes that for actual prediction, rather than for rough 
description, it is still necessary to resort to fitting a curve of the appro- 
priate kind. | 
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